Archive for the ‘commoditization’ Category

Box Plots and Beyond

December 7, 2015

Last weekend, I watched some statistics videos. Along with the stuff I know, came some new stuff. I also wrestled with some geometry relative to triangles and hyperbolas.

We’ll look at box plots in this post. They tell us what we know. They can also tell us what we don’t know. Tying box plots back to product management, it gives us a simple tool for saying no to sales. “Dude, your prospect isn’t even an outlier!”

So let’s get on with it.

Box Plots

In the beginning, yeah, I came down that particular path, the one starting with the five number summary. Statistics can take any series of numbers and summarize them into the five number summary. The five number summary consists of the minimum, the maximum, the median, the first quartile, and the third quartile.

Boxplots are also known as box and whisker charts. They also show up as candlestick charts. We usually see them in a vertical orientation, and not a horizontal one.

Notice the 5th and 95th percentiles appears in the figure on the right, but not the left. Just ignore it and stick with the maximum and minimum, as shown on the left. Notice that outliers appear in the figure on the left, but not on the right. Outliers might be included in the whisker parts of the notation or beyond the reach of the whiskers. I go with the latter. Where the figure on the left says the outliers are more than 3/2’s upper quartile, or less than 3/2’s the lower quartile. Others say 1.5 * those quartiles. Notice that there are other data points beyond the outliers. We omit or ignore them.

The real point here is that the customer we talk about listening to is somewhere in this notation. Even when we are stepping over to an adjacent step on the pragmatism scale, we don’t do it by stepping outside our outliers. We do it by defining another population and constructing a box and whiskers plot for that population. When sales, through the randomizing processes they use brings us a demand for functionality beyond the outliers of our notations, just say no.

We really can’t work in the blur we call talking to the customer. Which customer? Are they really prospects, aka the potentially new customer, or the customer, as in the retained customer? Are they historic customers, or customers in the present technology adoption lifecycle phase? Are they on the current pragmatism step or the ones a few steps ahead or behind? Do you have a box and whisker chart for each of those populations, like the one below?

This chart ignores the whiskers. The color code doesn’t help. Ignore that. Each stick represents a nominal distribution in a collective normal distribution. Each group would be a population. Here the sticks are arbitrary, but could be laid left to right in order of their pragmatism step. Each such step would have its own content marketing relative to referral bases. Each step would also have its own long tail for functionality use frequencies.

Now, we’ll take one more look at the box plot.

Here the outliers are shown going out to +/- 1.5 IRQs beyond the Q1 and Q3 quartiles. The IRQ includes the quartiles between Q1 and Q3. It’s all about distances.

The diagram also shows Q2 as the median and correlates Q2 with the mean of a standard distribution. Be warned here that the median may not be the mean and when it isn’t, the real distribution would be skewed and non-normal. Going further, keep in mind that a box plot is about a series of numbers. They could be z-scores, or not. Any collection of data, any series of data has a median, a minimum, a maximum, and quartiles. Taking the mean and the standard deviation takes more work. Don’t just assume the distribution is normal or fits under a standard normal.

Notice that I added the terms upper and lower fence to the figure, as that is another way of referring to the whiskers.

The terminology and notation may vary, but in the mathematics sense, you have a sandwich. The answer is between the bread, aka the outliers.

The Normal Probability Plot

A long while back, I picked up a book on data analysis. The first thing it talked about was how to know if your data was normal. I was shocked. We were not taught to check this before computing a mean and a standard distribution. We just did it. We assumed our data fit the normal distribution. We assumed our data was normal.

It turns out that it’s hard to see if the data is normal. It’s hard to see on a histogram. It’s hard to see even when you overlay a standard normal on that histogram. You can see it on a box and whiskers plot. But, it’s easier to see with a normal probability plot. If the data once ordered forms a straight line on a plot, it’s normal.

The following figure shows various representations of some data that is not normal.

Below are some more graphs showing the data to be normal on normal probability plots.

And, below are some graphs showing the data to not be normal on normal probability plots.

Going back to first normal probability plot, we can use it to explore what it is telling us about the distribution.

Here I drew horizontal lines where the plotted line became non-normal, aka where the tails occur. Then, I drew a  horizontal line representing the mean of the data points excluding the outliers. Once I exclude the tails, I’ve called the bulk of the graph, the normal portion, the normal component. I represent the normal component with a normal distribution centered on the mean. I’ve labeled the base axis of the normal as x0

Then, I went on to draw vertical lines at the tails and the outermost outliers. I also drew horizontal lines from the outermost outliers so I could see the points of convergence of the normal with the x-axis, x0. I drew horizontal lines at the extreme outliers. At those points of convergence I put black swans of the lengths equal to the heights or thicknesses of the tails giving me x1 and x2.

Here I am using the notion that black swans account for heavy tails. The distribution representing the normal component is not affected by the black swans. Some other precursor distributions were affected, instead. See Fluctuating Tails I and Fluctuating Tails II for more on black swans.

In the original sense, black swans create thick tails when some risk causes future valuations to fall. Rather than thinking about money here I’m thinking about bits, decisions, choices, functionality, knowledge–the things financial markets are said to price. Black swans cause the points of convergence of the normal to contract towards the y-axis. You won’t see this convergence unless you move the x-axis, so that it is coincident with the distribution at the black swan. A black swan moves the x-axis.

Black swans typically chop off tails. In a sense it removes information. When we build a system, we add information. As used here, I’m using black swans to represent the adding of information. Here the black swan adds tail.

Back to the diagram.

After all that, I put the tails in with a Bezier tool. I did not go and generate all those distributions with my blunt tools. The tails give us some notion of what data we would have to collect to get a two-tailed normal distribution. Later, I realized that if I added all that tail data, I would have a wider distribution and consequently a shorter distribution. Remember that the area under a normal is always equal to 1. The thick blue line illustrates such a distribution that would be inclusive of two tails on x1. The mean could also be different.

One last thing, the fact that the distribution for the normal probability plot I used was said to be a symmetric distribution with thick tails. I did not discover this. I read it. I did test symmetry by extending the  x1 and xaxes. The closer together they are the more symmetric the normal distribution would be. It’s good to know what you’re looking at. See the source for the underlying figure and more discussion at academic.uprm.edu/wrolke/esma3101/normalcheck.htm.

Onward

Always check the normalcy of your data with a normal probability plot. Tails hint at what was omitted during data collection. Box plots help us keep the product in the sandwich.

Economic Indifference II

March 24, 2009

Back on my now inaccessible blog, I talked about the need for product managers to practice economic indifference. I used the mathematics of manifolds to show how decisions make at one level of a system can be independent to those made a another level. I say “can” here because the decisions made at subordinate levels need to be consistent with the constraints placed on them by those above them. But, there is a flip side. That being enablers provided by those above them. Economic indifference is one such enabler.

When I tell you to build me a car, you can build any car. I can’t come back and say, I want a truck. Nor, can I come back and say I want a red car. My specification error is my problem, not yours. I left you the freedom, and you took it. My bad. Actually, it’s my bad for now wanting something more specific than what I had specified. The granuality of my spec dictates what I will have to accept, or put in different terms the amount of economic indifference I have to accept. Or, from the perspective of the developer, the degrees of freedom I left them. And, Agilists want as many degrees of freedom that they can get.

That said, today I’ll use the mathematics of vectors, to talk about many things around product management.

The Basic Vector

A vector! What does a vector have to do with business? Well, strategy, vision, forecast, capabilities, processes, projects, competition, and ultimately economic indifference.

From Here to There

When we need to figure out where a shop is in the mall, or which bus to take, we look for a map, and hopefully, it has a You Are Here indicator. Then, we look for our objective. The map provides us with a convoluted path to take to get there. We could draw a vector on the map, an as the bird flies view, but it doesn’t help much on the ground. That vector would be representative of our economic indifference. The shops along the way might catch our attention, we might get sucked in, we might buy something, we might have to take that something to the car, or grab a bite to eat before we finally arrive at the pointy end of the vector, our destination. Why did I come here, we wonder. Ah, Saturday, and I’m not at work? What’s wrong with me, playing hooky. The team….

Getting There--How One Vector Summarizes a Bunch of Vectors

Our path around the mall is shown in red. We decompose our objective vector into a collection of vectors. That collection of vectors get us there. That collection of vectors is closer to actuals, or the reality, and a little less economically indifferent than our objective vector.

Decomposed Yet Again, More Vectors

Leaving our mall example behind, we have a business objective (black), and we have these capabilities, resources, and staff (red) to get us to the our goal. The goal incorporates the matter of time into the vector. We arrive at a time and place. But, one of the capabilities is just a plan. It’s realization shows up as another collection of vectors (blue.) The red vectors represent ongoing operational capabilities. The blue vectors represent a project that will put the process into operation. We don’t have that capability right now, so its a risk.

How did we determine our goal and draw the objective vector? We did our forecast.

Forecast Factors as a Collection of Vectors

You build a forecast with metrics that show where you’ve been over time, and where you expect to be over time. It might be a set of rates. Each forecast factor heads off in its own direction.

Vectors Collected, a Morpheme

We can bring move the vectors around, so that they all start at the same point. Oddly enough, it starts to look like a morpheme, the meaning consitutents of words. Just an aside.

From Forecast Vectors to Strategy VectorHere I've added all the forecast vectors together to arrive at my strategy vector. I did not weight the forecast vectors. Strategy as an Organized Collection of Capabilities

Remember that the forecast variables were derived from time series data. A time series assumes that the policy basis and capabilitity basis haven’t changed. If they change, the old numbers become unreliable. When you build an estimation database, you end up creating an estimate based on averages. You are estimating your existing capabilities. The red vectors represent those capabilities. If our strategy was a real strategy, the red vectors would converge on the black vector not at the arrow, because you would divide the strategy into timeframes.
Strategies are supposed to be long-term propositions. Capabilities are created to support them. The capabilities persist for a long time. They may persist longer than the strategy that birthed them. Those capabilities improve over time. At the same time the capabilities become constraints on strategy.
Vision is not based on a forecast, or on a collection of capabilities. With a vision, you point off in a direction, and build the capabilities to get there. A product might be a strategy where it’s improvements would be linear, or in otherwords sustaining or continous. A product might be a vision where it’s improvements would be non-linear, radical, or discontinous. A vision departs strategy, and costs quite a bit more than doing the same old same old day in and day out.

Strategy (Yesterday) and Vision (Tomorrow)

Vision requires us to let go of the prior strategy and get with the new program. You might run into a vision when you get a new CEO. You will run into a vision if you ever face a market transition as described by Moore’s technology adoption lifecycle. It amounts to leaping into the new boat. The old boat is sinking. Well, maybe not yet, but its sinking is anticipated.

Transition to Vision

As you move to the new vision, you will use existing capabilities (red), reduce existing capabilities (gray), enhance existing capabilities (green), and add capabilities and processes through projects.
Moving your product to a new market, or extending your existing market would require the same kinds of efforts. Your product is a vector.
In the past few weeks, there has been a lot of discussion about fast followers, the lost of differentiation and price premiums, and commoditization. When your customers will no longer pay for additional capabilities you’ve incorporated into your product, you have been commoditized. Some customers may continue to pay, but never use those additional capabilities. In this case, your customers are overserved, and you can find yourself being attacked by a new entrant with a product based on a new technology. That entrant might not meet your performance threasholds, because they compete on other drivers. All of this involves messing with our vectors.
Commoditization forces you to find new drivers, new vectors of differentiation.
Fighting a fast follower depends on proprietary technologies or slight of hand.

Routes to a Feature

A feature can be created quickly (red). It will be thin. It might involve adding a dialog, a button or menu option, a column or two to your database tables, and a few computations. The same feature can be create richly (blue). You do this to explore future opportunites, and to build things that don’t show up in the interface, or in the reverse engineering efforts to capture behavior. If SaaS does anything for us, it removes executable code from the hands of our competitors, so reverse engineering is about what is seen under testing at the interfaces. You might not expose the APIs for some of the depth you’ve created in your rich project. Expose just enough to make it look quick. Then, when the fast follower releases their catch up, you release the next layer to exposure.
That exploration might increase the conceptual surface area or create a much richer conceptual geography for later exploitation. Let them follow. You can keep on rolling out premium. Maybe you’ll teach them to slow down, or look deeper. “Now eating time at the lunch counter.”
You could also take a vector view of your team. You could use those n-dimensional charts in Excel to map our the abilities of your team members, their estimation factors, your influence with each of them, and your communications effectiveness with each of them. You could then add their morpheme views together to form words if you will, which add up the vectors to a team score or vector.
That might be a bit much, but much is possible with a good representation of the problem of shipping on time and hitting your P&L numbers.
Getting distance from the details grants economic indifference. The big vector doesn’t care, but the comprising vectors determine the success of the big vector.
And, for all those product managers who want the stick, instead of the hard work of building influence, one last vector veiw.

The Stick and the Goal

When  hit with a stick, the capability leaves with some portion of your gantt chart. Congrats! Forget the stick.