Archive for December, 2015


December 24, 2015

In this blog, I’ve wondered why discontinuous innovation is abandoned by orthodox financial analysis. Some of that behavior is due to economies of scale. Discontinuous innovation creates it’s own category on it’s own market populations. Discontinuous innovation doesn’t work when forced into an existing population by economies of scale, or economies of an already serviced population.

But, there is more beyond that. In “The Inventor’s Dilemma,” Christensen proposed separation as the means to overcome those economy of scale problem. Accountants asserted that it cost too much, so in the end his best idea did not get adopted. Did he use the bowling ally to build his own category and market population? No. Still, his idea was dead on, but not in the familiar way of separation as a spin out.

Further into the math, however, we find other issues like the underlying geometry of the firm evolves from a Dirac function, to a set of  Poisson games, to an approach to the normal, to the normal, and the departure from that normal. The firm’s environment starts in a taxi cab geometry (hyperbolic) where linearity is fragmented, becomes Euclidean where linearity is continuous, and moves on to spherical where multiple linarites, multiple orthodox business analyses, work simultaneously.

With all these nagging questions, we come to question of coordinate systems. Tensors were the answer to observing phenomena in different frames of reference. Tensors make transforms between systems with different coordinate systems simple. Remember that mathematicians always seek simpler. For a quick tutorial on tensors, watch Dan Fleisch explain tensors in “What’s a Tensor.”

In seeking the simpler mathematicians start off hard. In the next video, the presenter talks about some complicated stuff, see “Tensor Calculus 0: Introduction.” Around 48:00/101:38 into the video,  the presenter claims that the difficulties in the examples were caused by the premature selection of the coordinate systems. Cylindrical coordinates involve cylindrical math, and thus cylindrical solutions; polar, similarly; linear, similarly. Tensors simplified all of that. The solutions were analytical, thus far removed from the geometric intuition. Tensors returned us to our geometric intuitions.

The presenter says that when you pick a coordinate system, “… you’re doomed.” “You can’t tell. Am I looking at  a property of the coordinate system or a property of the problem?” The presenter confronts the issue of carried and carrier, or mathematics as media. I’ve blogged about this same problem in terms of software or software as media. What is carried? And, what is the carrier presenting us with the carried?

Recently, there was a tweet linking to a report on UX developer hiring vs infrastructure developer hiring. These days the former is up and the latter is down. Yes, a bias towards stasis, and definitely away from discontinuous innovation in a time when the economy needs the discontinuous more than the continuous. The economy needs some wealth creation, some value chain creation, some new career creation. Continuous innovation does none of that. Continuous innovation captures some cash. But, all we get from Lean and Open and fictional software is continuous innovation, replication, mismanaged outside monetizations and hype, so we lose to globalism and automation.

I can change the world significantly, or serve ads. We choosing to serve ads.

Back to the mathematics.

I’m left wondering about kernels and how they linearize systems of equations. What does a kernel that linearizes a hyperbolic geometry look like? Spherical kernels likewise? We linearize everything regardless of whether it’s linear or not. We’ve selected an outcome before we do the analysis just like going forward with an analysis embedding a particular coordinate system. We’ve assumed.  Sure, we can claim that the mathematics, the kernel makes us insensitive or enable us be insensitive to the particular geometry. We assume without engaging in WIFs.

Kernels like coordinate systems have let us lose our geometric intuition.

There should be a way to do an analysis of discontinuous innovation without the assumptions of linearity, linearizing kernels, a Euclidean geometry, and a time sheared temporality.

Time sheared temporality was readily apparent when we drove Route 66. That tiny building right there was a gas station. Several people waited there for the next model-T to pull in. The building next to it is more modern by decades.

This is the stuff we run into when we talk about design or brand, or use words like early-stage–a mess that misses the point of the technology adoption lifecycle, only the late main street and later stages involve orthodox business practices typical of F2000 firms. That stuff didn’t work in the earlier phases. It doesn’t work when evaluating discontinuous innovation.


Is the underlying technology yet to be adopted? Does it already fit into your firm’s economies of scale? Wonder about those orthodox practices and how they fail your discontinuous efforts?





December 14, 2015

More statistics this week. Again, surprise ensued. I’ll be talking math, but thinking product management. I’ve always thought in terms of my controls and their frequency of use, but when did the data converge? When does it time series on me? Agile and Minimal Viable Product are experiment based. But, how deep is our data?

So while everything I’m going to say here is not new to anyone maybe, you’ll be surprised somewhere along the way.

First, we start with the definition of probability.

01 Probability 01

The stuff between the equal signs is predicate calculus, or mathematical logic, the easy stuff. It’s just shorthand, shorthand that I never used to get greasy with my notes. In college, I wanted to learn it, but the professor didn’t want to teach it. He spent half the semester reviewing propositional calculus, which was the last thing I needed.

Moving on.

01 Probability 02

What surprised me was “conditions or constraints.” That takes me back to formal requirements specification, in the mid to late 80’s, where they used an IF…Then… statements to prove the global context of what program proving could only prove locally. Requirements were questions. Or, Prolog assertions that proved themselves.

Constraints are the stuff we deal with in linear programming, so we get some simultaneous equations underpinning our probabilities.

01 The World

The red stuff is the particular outcome. Anything inside the box is the sum of all outcomes. Just take the space outside the distribution as zero, or ground.

Lately, I got caught on the issue of what is the difference between iteration and recursion. I Googled it. I read a lot of that. I’ve done both. I’ve done recursive Cobol, something my IT-based, aka data processing, professor didn’t like. No, it was structured coding all the way. Sorry, but I was way early with objects at that point. But, back to the difference, no none of it really stuck me as significant.

What I really wanted was some explanation based on the Ito/Markov chain notions of memory. So I’ll try to explain it from that point of view. Lets start with iteration.


02 Iteration Iteration has some static or object variables where it saves the results of the latest iteration. I’m using an index and the typical for loop constructs. There are other ways to loop.

That’s code, but more significantly, is the experiment that we are iterating. The conditions and context of the experiment tell us how much data has to be stored. In iterations, that data is stored, so that it can be shared by all the iterations. Recursion will put this data elsewhere. The iteration generates or eats a sequence of data points. You may want to process those data points, so you have to write them somewhere. The single memory will persist beyond the loop doing the iteration, but it will only show you the latest values.

It can take a long time to iterate to say the next digit in pi. We can quickly forecast some values with some loose accuracy, call it nearly inaccurate, and replace the forecast with accurate values once we obtain those accurate value. Estimators and heuristics do this roughing out, sketching for us. They can be implemented as iterations or recursions. Multiprocessing will push us to recursion.

03 Iteration w Heuristic

Notice that I’ve drawn the heuristic’s arc to and from the same places we used for our initial iterations or cycles. The brown line shows the heuristic unrolled against the original iterations. This hints towards Fourier Analysis with all those waves in the composition appearing here just like the heuristic. That also hints at how a factor analysis could be represented similarly. Some of the loops would be closer together and the indexes would have to be adjusted against a common denominator.

Throughout these figures I’ve drawn a red dot in the center of the state. Petri nets use that notional, but I’m not talking Petri nets here. The red dots were intended to tie the state to the memory. The memory has to do with the processing undertaken within the state, and not the global notions of memory in Markov chains. The memory at any iteration reflects the state of the experiment at that point.


In recursion, the memory is in the stack. Each call has its own memory. That memory is sized by the experiment, and used during the processing in each call. Iteration stops on some specified index, or conditions. Recursion stops calling down the stack based on the invariant and switches to returning up the stack. Processing can happen before the call, before the return, or between the call and the return. Calling and returning are thin operations; processing, thick.

04 Recursion

The individual memories are shown as red vertical lines inside the spiral or tunnel. We start with calls and when we hit the invariant, the blue line, we do the processing and returning. We start at the top of the stack. Each call moves us towards the bottom of the stack, as defined by the invariant. Each return moves us back towards the top of the stack. The graph view shows the location of the invariant. The calling portion of the tunnel is shorter than the processing and returning portion of the tunnel.

Notice that I’m calling the invariant the axis of symmetry. That symmetry would be more apparent for in-order evaluation.  Pre-order evaluation, and post-order evaluation would be asymmetrical, or giving rise to skewed distributions.

Recursion is used in parsers and in processing trees, potentially game trees. In certain situations we are looking for convergences of distributions or sequences.

05 Convergence and Sequence

The black distribution here represents a Poisson distribution. This is the Poisson distribution of the Poisson game typical of the early adopter in the bowling ally of the technology adoption lifecycle. That Poisson distribution tends to the normal over time through a series of normal. The normal differ in the width of their standard deviations. That increase in widths over time is compensated for by lower heights, such that the area of all those normal is one.

We also show that each call or iteration can generate the next number in a sequence. That sequence can be consumed by additional statistical processing.

06 Numeric Convergence

Here, in a more analytic process, we are seeking the convergence points of some function f(n). We can use the standard approach of specifying a bounds for the limit, , or a more set theoretic limit where two successive values are the same, aka cannot be in the same set. Regardless of how that limit is specified, those limits are the points of convergence. Points of convergence give us the bounds of our finite world.

Throughout I’ve used the word tunnel. It could be a spiral, or a screw. Wikipedia has a nice view of two 3D spirals, take a look. I didn’t get that complex here.

07 3D Sphere Spiral


When you experiment, and every click is an experiment in itself, or in aggregate, how long will it take to converge to a normal distribution, or to an analytic value of interest? What data is being captured for later analysis? What constraints and conditions are defining the experiment? How will you know when a given constraint is bent or busted, which in turn breaks the experiment and subsequent analysis?




Box Plots and Beyond

December 7, 2015

Last weekend, I watched some statistics videos. Along with the stuff I know, came some new stuff. I also wrestled with some geometry relative to triangles and hyperbolas.

We’ll look at box plots in this post. They tell us what we know. They can also tell us what we don’t know. Tying box plots back to product management, it gives us a simple tool for saying no to sales. “Dude, your prospect isn’t even an outlier!”

So let’s get on with it.

Box Plots

In the beginning, yeah, I came down that particular path, the one starting with the five number summary. Statistics can take any series of numbers and summarize them into the five number summary. The five number summary consists of the minimum, the maximum, the median, the first quartile, and the third quartile.

boxplot 01

Boxplots are also known as box and whisker charts. They also show up as candlestick charts. We usually see them in a vertical orientation, and not a horizontal one.

boxplot 04

Notice the 5th and 95th percentiles appears in the figure on the right, but not the left. Just ignore it and stick with the maximum and minimum, as shown on the left. Notice that outliers appear in the figure on the left, but not on the right. Outliers might be included in the whisker parts of the notation or beyond the reach of the whiskers. I go with the latter. Where the figure on the left says the outliers are more than 3/2’s upper quartile, or less than 3/2’s the lower quartile. Others say 1.5 * those quartiles. Notice that there are other data points beyond the outliers. We omit or ignore them.

The real point here is that the customer we talk about listening to is somewhere in this notation. Even when we are stepping over to an adjacent step on the pragmatism scale, we don’t do it by stepping outside our outliers. We do it by defining another population and constructing a box and whiskers plot for that population. When sales, through the randomizing processes they use brings us a demand for functionality beyond the outliers of our notations, just say no.

We really can’t work in the blur we call talking to the customer. Which customer? Are they really prospects, aka the potentially new customer, or the customer, as in the retained customer? Are they historic customers, or customers in the present technology adoption lifecycle phase? Are they on the current pragmatism step or the ones a few steps ahead or behind? Do you have a box and whisker chart for each of those populations, like the one below?


This chart ignores the whiskers. The color code doesn’t help. Ignore that. Each stick represents a nominal distribution in a collective normal distribution. Each group would be a population. Here the sticks are arbitrary, but could be laid left to right in order of their pragmatism step. Each such step would have its own content marketing relative to referral bases. Each step would also have its own long tail for functionality use frequencies.

Now, we’ll take one more look at the box plot.


Here the outliers are shown going out to +/- 1.5 IRQs beyond the Q1 and Q3 quartiles. The IRQ includes the quartiles between Q1 and Q3. It’s all about distances.

The diagram also shows Q2 as the median and correlates Q2 with the mean of a standard distribution. Be warned here that the median may not be the mean and when it isn’t, the real distribution would be skewed and non-normal. Going further, keep in mind that a box plot is about a series of numbers. They could be z-scores, or not. Any collection of data, any series of data has a median, a minimum, a maximum, and quartiles. Taking the mean and the standard deviation takes more work. Don’t just assume the distribution is normal or fits under a standard normal.

Notice that I added the terms upper and lower fence to the figure, as that is another way of referring to the whiskers.

The terminology and notation may vary, but in the mathematics sense, you have a sandwich. The answer is between the bread, aka the outliers.

The Normal Probability Plot

A long while back, I picked up a book on data analysis. The first thing it talked about was how to know if your data was normal. I was shocked. We were not taught to check this before computing a mean and a standard distribution. We just did it. We assumed our data fit the normal distribution. We assumed our data was normal.

It turns out that it’s hard to see if the data is normal. It’s hard to see on a histogram. It’s hard to see even when you overlay a standard normal on that histogram. You can see it on a box and whiskers plot. But, it’s easier to see with a normal probability plot. If the data once ordered forms a straight line on a plot, it’s normal.

The following figure shows various representations of some data that is not normal.

Not Normal on Histogram Boxplot Nomal Probability Plot

Below are some more graphs showing the data to be normal on normal probability plots.

Normal Data

And, below are some graphs showing the data to not be normal on normal probability plots.

Non-normal Data

Going back to first normal probability plot, we can use it to explore what it is telling us about the distribution.

Normal Probability Plot w Normal w Tails

Here I drew horizontal lines where the plotted line became non-normal, aka where the tails occur. Then, I drew a  horizontal line representing the mean of the data points excluding the outliers. Once I exclude the tails, I’ve called the bulk of the graph, the normal portion, the normal component. I represent the normal component with a normal distribution centered on the mean. I’ve labeled the base axis of the normal as x0

Then, I went on to draw vertical lines at the tails and the outermost outliers. I also drew horizontal lines from the outermost outliers so I could see the points of convergence of the normal with the x-axis, x0. I drew horizontal lines at the extreme outliers. At those points of convergence I put black swans of the lengths equal to the heights or thicknesses of the tails giving me x1 and x2.

Here I am using the notion that black swans account for heavy tails. The distribution representing the normal component is not affected by the black swans. Some other precursor distributions were affected, instead. See Fluctuating Tails I and Fluctuating Tails II for more on black swans.

In the original sense, black swans create thick tails when some risk causes future valuations to fall. Rather than thinking about money here I’m thinking about bits, decisions, choices, functionality, knowledge–the things financial markets are said to price. Black swans cause the points of convergence of the normal to contract towards the y-axis. You won’t see this convergence unless you move the x-axis, so that it is coincident with the distribution at the black swan. A black swan moves the x-axis.

Black swans typically chop off tails. In a sense it removes information. When we build a system, we add information. As used here, I’m using black swans to represent the adding of information. Here the black swan adds tail.

Back to the diagram.

After all that, I put the tails in with a Bezier tool. I did not go and generate all those distributions with my blunt tools. The tails give us some notion of what data we would have to collect to get a two-tailed normal distribution. Later, I realized that if I added all that tail data, I would have a wider distribution and consequently a shorter distribution. Remember that the area under a normal is always equal to 1. The thick blue line illustrates such a distribution that would be inclusive of two tails on x1. The mean could also be different.

One last thing, the fact that the distribution for the normal probability plot I used was said to be a symmetric distribution with thick tails. I did not discover this. I read it. I did test symmetry by extending the  x1 and xaxes. The closer together they are the more symmetric the normal distribution would be. It’s good to know what you’re looking at. See the source for the underlying figure and more discussion at


Always check the normalcy of your data with a normal probability plot. Tails hint at what was omitted during data collection. Box plots help us keep the product in the sandwich.