Archive for November, 2010

Factor Analysis, Long Tails, and Stacks

November 3, 2010

Factor Analysis and Scree Diagrams

In my last blog post Ordinals for Product Managers I described a scree diagram as the product of factor analysis. A factor analysis is a form of multivariate analysis. Factor analysis builds a collection of factors that account for the variance to some fixed percentage of the whole. This analysis is computationally intensive, so beyond a threshold of 85% the effort can exceed the return.

Factor analysis to a theshold of 85 percent

Factors accounting for 85 percent of the variance among the data considered.

If we consider a single minimal marketable functionality (MMF) to be the entity under study, and order the use frequencies of the features in the GUI for that MMF, We would end up with a discrete representation, a scree diagram, that depicts the amount of attention the user grants to each feature or control. Use frequencies would order the features in terms of their use and contribution to value. Use frequencies would show how often a feature was “bought” or clicked on.

A single MMF would translate into one or more tasks, or jobs to be done. For an MMF to be marketable, it would have to focus on a user’s job to be done–a carried/content task, rather than strictly architectural or infrastructural jobs–carrier tasks. Administration might be necessary, but nobody buys your software to administer it. Marketable implies benefit implies task performance. If the unit of study were tasks, then the factor analysis would organize those tasks by task performance frequency.

As additional MMFs are developed and released, the use frequencies would change, and the order of the frequencies as factors would reflect these changes in use frequencies.

The Long Tail

The Long Tail is a continuous depiction of a factor analysis. It is also known as a power law or Pareto distribution. The Long Tail was conceptualized as an organization of markets by size. Use frequencies order marketed functionality similarly. Every click is sold. Every click shows up as a data point in a factor analysis, in a scree diagram, or in a long tail.

A Pareto distribution has some split like the ratios 80:20 or 85:15. I’ve used the 20:80 split. In the long tail, the 20 percent was hits–hits being the front list, and the 80 percent was back list offerings. Over time hits migrate down the tail and may become persistent in a cult market. Yes, Rockie Horror Picture Show is still showing, and it’s cult is still showing up. Software being a media is not different. Then again, cut and paste is a hit.

The long tail in terms of its hit and non-hit offerings.

The long tail in terms of its hit and non-hit offerings.

For software, the hits or front list consist of the GUIs for the operating system, browser, and maybe some portion of the application’s key features. The rest of the features are spread out down the tail of the distribution.

The Long Tail of the features in the GUI.

The Long Tail of the features in the GUI.


An interesting aspect of the GUI, depicted as a long tail, is that the whole product components show up, and that the software stack is hidden. Black boxes are not considered. If an application provides an API, then there is another interface, which will have use frequencies associated with it. Given that the whole product components provide most of the front list, the layering of the stack is reflected in the use frequencies of both the APIs and the GUIs considered. This provides an interesting representation.

The software stack and the GUI as long tails

The software stack and the GUI as long tails

In this figure the APIs are ordered by frequency of use in the model component of the representation. The stack reaches much deeper than the application, thus the arrow. Likewise, the GUI features are ordered by frequency of use in the view component of the representation. Use extends well into the future, thus the arrow.

This figure also illustrates the separation between carrier and carried when considering software to be a media. The Pareto split is the boundary between carrier and carried. Technology carries the content focused on the jobs to be done, which originate in the functional domains elicited during requirements collection.

The software as media, carrier vs. content split, is recursive. Tech carries the automated domain. But, on an organizational level, the product is carried by an organization, or a collection of organizations depending on the degree of horizontal integration and the participation of complementors and fast followers in the market.

Carriers need not be physically linked. At an astrophysical level, gravity, aka the force of attraction, and the force of repulsion balance each other out over a wide area of, rather than a single point of, equilibrium. They can be represented by two power law distributions that face off and overlap. The area under their intersection constitutes the area of equilibrium.

When an application moves across the technology adoption lifecycle, the focus of its offer changes. It starts off being carrier focused. Then, it shifts to carried/content focus. Back to carrier focused in the IT horizontal/early mainstreet, and back to content in the late mainstream/consumer/convergent/SaaS markets. It continues to change focus moving into the laggard and phobic markets.

In the late mainstream, strategies like value-basing put an emphasis on the business components of the offer and expands the breadth of the business or organizational components. The product and the organizational components overlap and oscillate in terms of focus across the technology adoption lifecycle as well. The overlapping marks a competition, similar to that of the forces of gravity and repulsion, between product and organizational offer components. As the product become commoditized and pricing pressures drive the value of the features down, organizational components strive to keep the margins high.

All of this leads to another representation.

Power Law with stacks orgs and complementors

Power Law with stacks orgs and complementors

Notice that one complementor is shown in the diagram. This complementor might have come into existence earlier, but always comes into existence after the primary vendor provides an API. This time lag is indicated by the “dx,” notation in the gap between the balance or equilibrium point and the origin of the complementor’s vector of differentiation, or axis serving as the floor for the complementors power law distributions.

Complementors always compete in the late mainstream phase of the technology adoption lifecycle, because they compete on promo dollars. Market leadership is driven solely by promo spend. No complementor has persistent monopoly power.

The aqua triangle below the power law distributions represents the decision tree that gives rise to the software and the vendor organization. The thin aqua line divides the software into its API and GUI components. The thicker partitions of the triangle divide the decisions into software (left), in-offer organizational components (middle), and the out of offer organizational components (right), aka the organizations blackbox/backend/stack. The organizational components reach into the value chain supporting the underlying technology and product(s).

Comments? Please comment. Thanks.