Yes or No in the Core and Tails III

July 2, 2018

So the whole mess that I mentioned in Yes or No in the Core and Tails II, kept bothering me. Then I realized that the order of the decisions didn’t matter. I can move the don’t cares to the bottom of my tree. It took a while to revise the tree. In the meantime, I read Part 2 of the Visual Introduction to Machine Learning, which led me to believe that moving the don’t cares was the correct thing to do.

Decision Tree 3

The figure too small to see. But it is a complete binary tree of size 211, which takes us to 2048 bits, or a sample size of n=2048. Notice that we achieve normality at n=1800. This situation should present us with a skewed normal, but somehow the distribution is not skewed according to  John Cooks binary outcome sample size calculator. Of course, I’m taking his normality to mean standard normal. Those five layers of don’t cares give us some probability of 1/32, or p = 0.03125 at each branch at 26. Or, taking using the number from higher density portion of the tree, 1800/2048 = 0.8789, or the number from the from the lower density portion of the tree, 248/2048 = 0.1210. No, I’m not going to calculate the kurtosis. I’ll take John’s normal to be a standard normal.

The neural net lesson taught a nice lesson summed up by the figure Bias and Variance in MLabout bias and variance. Yes, we are not doing machine learning, but another term for the same thing is statistical learning. We have the same problems with the statistical models we build for ourselves. We have bias and variance in our data depending on how we define our model, aka what correlations we use to define our model.

Model complexity is indirectly related to bias. And, model complexity is directly related to variance. Part 2 of the Visual Introduction to Machine Learning explains this in more depth if you haven’t read it yet.

Watch the zedstatistics series on correlation. It will take some time to see how his models changed their definitions over the modeling effort. He is seeking that minimum error optimization shown in the figure. Much of it involves math, rather than data.

Given that we have pushed our don’t cares down below our cares, we set ourselves up in Tails and Epsilona sort of Cauchy distribution. Cauchy distributions have thicker tails than normals as shown in the normal on the right. In some sense, the tail thickness is set by moving the
x-axis of the normal down. Here we did that by some epsilon. In a marketing sense, that would be an upmarket move without renormalization. But, in our “don’t care” sense the don’t cares are defining the thickness of that epsilon.

With normal distribution shown on the right, we are defining our known as what we got from our sample, our soft of known as the space of the don’t cares, and our unknowns as the yet to be surveyed populations. The soft of knowns represent our tradeoffs. We had to choose a path through the subtree, so we had to ignore other paths through the subtree. There were 32 paths or 25 paths of the 211 paths. Keep in mind that the don’t cares don’t mean we don’t care. Don’t cares allow us to solve a problem with a more general approach, which we usually take to minimize costs. But, in the marketing sense, it’s more that we didn’t ask yet. Once we ask and get a firm determination, we firm up one path from the 32 possible paths. We can use don’t cares to move forward before we have a definitive answer.

But, the bias and variance figure tells us something else. It tells us where in the machine learning sense the ideal solution happens to be. It is at the minimum of a parabola. In the frequentist sense, that minimum defines a specific standard deviation, or in the approach to the normal sense, that minimum tells us where our sample has become normal. It also tells us where we have become insensitive to outliers.

Once we have found the minimum, we have to realize that minimum in the development or definitional effort. Agilists would stop when they reach that minimum. Would they realize that they reached it? That is another matter. Ask if they achieved normality or not. But, the goal of machine learning is to approximate a solution with limited data, or approximating the parabola with a limited number of points on the parabola. Once you’ve approximated the parabola, finding the minimum is a mathematical exercise.

We can represent the product as a line through that minimum. That line would represent the base of a decision tree. I’ve represented these decision trees as triangles. Those triangles being idealizations. A generative effort in a constraint space is much messier than a triangle would suggest.

I’ve annotated the bias and variance graph with such a line. I’ve used a straight line to Bias and Variance in ML 2represent the realization. Every realization has an ontology representing the conceptualization to be realized. Every realization also has a taxonomy, but only after the realization. It boils down to ontologies before and taxonomies after. In the figure, the line from the minimum error to the baseline of the bias and variance graph is the target of the development effort. The realization line was projected and redrawn to the right. Then, the ontology and the taxonomy were added. Here the ontology and the taxonomy are identical. That is far from reality. The ontology and the taxonomy are symmetrical here, again far from reality.

The figure below the one on the right shows a messier view of a realization to be achieved over muliple releases. The solid red line has been released. There is an overall taxonomy, the enterprise taxonomy. And, there is the taxonomy of the user. The user’s effort generates some value that is significant enough to warrant continued development of the intended realization shown as red striped line. The user’s taxonomy is limited to the user’s knowledge of the carried content. The user’s knowledge might need to be enhanced with some training on the underlying concept. The user may not know the underlying conceptual model defined in the ontology. The developers might not know the underlying conceptual model either.

We cannot feed an ontology to a neural network. And, that neural network won’t discover that ontology. When Google wrote that Go playing application, it discovered a way to play Go, that no humans would have discovered. There are more ways to get to a realization than through ontologies and taxonomies.

The value of a realization is achieved by projecting effort through the realization. That value is evaluated relative to a point of value. That value is evaluated by some valuation baseline. Different managers in an enterprise would have different views of the realization, and different valuation baselines.

The symmetries, asymmetries, and axes of those symmetries that I highlighted are significant indicators of what must be learned, and who must learn what is being taught. Value realization is tied to what must be taught. The need to teach like the need to design interfaces are signals that underlying ontology was not known to the users, and not known and subsequently learned by the developers. The need to teach and design shows up more in products designed for sale or external use.

So what is a product manager to do? Realize that the number of samples is much larger than what Cook’s formula tells us the minimum number of samples would be. Don’t cares are useful minimizations. There is one ontology and many taxonomies. Agile assumes that the ontology will be discovered by the developer. When the UI is not straightforward, the ontology has been departed from. And, there are many views of value and many valuation baselines.











Point of Value

June 21, 2018

A few days ago,  tweeted a video where he was saying that it was all about value. We get the idea that the product is the value we are delivering, but that is a vendor specific view. What we are really doing to providing a tool with which the economic buyer purchases, so their people can use it to create value beyond the tool. I’ve called this concept projecting the value through the product. It is the business case, the competitive advantage derived from use that provides the economic buyer with value, not the product itself. This same business case can convince people in the early adopters two-degrees of separation network to buy the product moving it across the chasm if the underlying technology involves a chasm.

An XML editor provides no value just because it was installed. The earliest version of Gartner’s total cost of ownership framework classified that install as effort not expended doing work. They called it a negative use cost. The product has not been used yet. The product has not generated any value and yet cost were accumulating. Clearly, the XML editor did not provide the owner with any value yet.

Once a user tags a document with that XML editor and publishes that document, some value is obtained by someone. The user has a point of view relative to the issue of value. And, the recipient of that value has their own point of view on the value. When the recipient uses the information while writing another report, the value chain moves the point of view on the value again, and more value accumulates.

That led me to think in term of a value chain, the triangle model, and the projection of value. So I drew a quick diagram and redrew it several times.

In this first figure, the thickVP 03 black line of the diagram on the left is the product. Different departments use the product. The use of the product is focused, and the value is delivered at the peaks of those downward facing triangles. The value shown by the black triangles is used within the red triangles. The use inside the red triangles delivers value to the peaks of the red triangles. Notice there is a thick red line, labeled E. This represents the use of the underlying application by users outside the entities represented by the black triangles. that report to the red entity. The underlying application is doing different things for users in different roles and levels.

All this repeats for the purple entities and values, and the blue entities and values. Value is projected from the interface to a point of value through work. That delivered value is projected again to the next point of value. The projections through work continue to accumulate value as the points of value are traversed.

The diagram on the right, in the top figure, diagrammatically depicts the value chaining and points of value, shown in the diagram on the left. It should be clear that the value is created through work, work enabled by the product. The product is the carrier, and the work is the carried content. The work should be entirely that of the purchaser’s users.

VP 01I’ve always thought of product as being the commercialization of some bending or breaking of constraints. I stick with physical constraints. In the figure on the left, we start with the linear programming of some process. Research developed a way to break a constraint across some limited range that I’ve called an accessibility gate. Once we can pass through that gate, we can acquire the value tied up in the accessed area (light blue).

The effort to pass through that gate involved implementing five factors. Those factors are shown as orange triangles that represent five different deliverables. Each of these factors are different components of the software to be delivered. The order of delivery should increase the customer or client’s willingness to pay for the rest of the effort. Value has to be delivered to someone to achieve this increased willingness. Quickly delivering nothing gets us where? The thin purple curve order various point of value in a persuasive delivery order.

Some of the factors are not complete before they are being used and projecting some value. The projection of value is not strictly linear. The factor on the far left involves code exclusively but is the last of the factors to deliver value. For this factor, it takes three releases to deliver value to three points of value.

The other factors require use by the customer’s or client’s organization to project the desired value.

Further value is accomplished by entities remote from the product. This value is dependent on the value derived by the entities tied to the product. I’ve labeled these earlier entities as being independent. The distant projections of value are dependent on the earlier ones. It remains to be seen if any of it is independent.

The path symmetries tie into the notions of skew and kurtosis as well as projections as being subsets or crosscutting concerns. Organizational structure does not necessarily tell us about where the value accrues.

VP 02In the next figure, we take you from the user to the board member. The red rectangle represents the product. The thick black line indicates the work product projected from the user through the product. The thin red arrows represent the various changes in the points of value. The thin light blue lines show the view of the value.

At some point in the value chain, the value becomes a number and later a number in an annual report. The form of the underlying value will change depending on how a given point of value sees things. This is just as much an ethnographic process as requirements elicitation. These ethnographic processes involve implicit knowledge and the gaps associated with that implicit knowledge. Value projection is both explicit and implicit.



June 11, 2018

Today, someone out on twitter mentioned how power users insist on the complex, while the ordinary users stick with the simple. No. It’s more complicated than that. And, these days there is on excuse for the complex.

Lately, I’ve been watching machine learning videos and going to geek meetups. One guy was talking about how machine learning is getting easier as if that was a good thing. And, he is a geek. Easier, simpler happens. And, as it does, the technology can’t generate the income it used to generate. Once the average user can do machine learning without geeks, what will the geeks do to earn a living? Well, not machine learning.

The technology adoption lifecycle is organized by the pragmatism of the managers buying the stuff and the complications and simplicities of the technology. The technology starts out complicated and gets simpler until it vanishes into the stack. It births a category when it’s discontinuous, aka a completely new world, and it kills the category once it has gotten as simple as it can be. The simpler it gets, the less money can be made, so soon enough everybody can do it, and nobody can make any money doing it. We add complications so we can make more money. Actually, we don’t. Things don’t work that way.

So I drew a technology adoption lifecycle (TALC) vertically. I’ve modified the place of the technical enthusiasts in the (TALC). They are a layer below the aggregating mean. They span the entire lifecycle. I left Moore’s technical enthusiasts at the front end of the vertical. And, I’ve extended the technical enthusiasts all the way out to the efforts prior to bibliographic maturity.


I used the word “Complicated” rather than complex. Complicated is vertically at the top of the figure. Simpler is at the bottom. The left edge of the technical enthusiast slice of the normal is the leading edge of the domain where the complicated, the complex is encountered. The complex can be thought of like constraints. Once you simplify the complex there is more complexity to simplify. The vertical lines represent consecutive simplifications. Where there are many vertical lines, the complications are those of the people working on the carrier aspects of the complexity. I drew a horizontal line to separate the early and late phases. I did this to ghost the complexity grid. There is more than enough going on in the distribution itself. the vertical lines below that horizontal line are the complexity lines related to the TALC phases on the right side of the TALC, to the right of the mean, to the right of the peak. Or in this figure, instead of the usual left and right, think above and below.

In the diagram, I put “Simpler” above (to the right of) “Complicated.” This is then labeled “Simpler 1.” We are still in the lab. We are still inventing. This simplification represents the first task sublimation insisted on by the TALC. This task sublimation happens as we enter into the late mainstreet, consumer phase. Technical enthusiasts don’t need simpler. But, to move something out of the IT horizontal into broader use, it has to get simpler.

Simpler is like graph paper. “Simpler 1” is distant from the baseline and aligned with the TALC phases, although the diagram separates them for clarity, hopefully.

The device phase, aka the phase for the laggard population, absolutely requires technology that is far simpler than what we had when we moved the underlying technology into the consumer phase, late mainstreet. Devices are actually more complicated because the form factor changes and an additional carrier layer gets added to everything.  The orange rectangle on the left of the device phase is the telco geeks and their issues. The carried content gets rewritten for simpler UI standards. The tasks done on a device shouldn’t be the same as those done on a laptop or a desktop. The device phase presents us with many form factors. Each of those form factors can do things better than other form factors. But, again, the tasks done on each would be limited.

In Argentine tango, when you have a large space in which to dance, you can dance in the large. But, when the crowd shows up or the venue gets tiny, we tighten up the embrace and cut the large moves. Our form factor shrinks, so our dance changes.

How would basketball feel if it was played on a football field?

The cloud phase, aka the phase for the phobic population, requires technology that is totally hidden from them. They won’t administer, install, upgrade, or bother in the least. The carrier has to disappear. So again the UI/UX standards change.

The phase specificity of the TALC should tell us that each phase has its own UI standards. With every phase, the doing has to get simpler. The complexities are pushed off to the technical enthusiasts who have the job of making it all seem invisible to the phobics, or simple to the laggards, or somewhat simpler to consumers.

Task sublimations, simplifications, are essential to taking all the money off the table. If we get too simple too fast, we are leaving money on the table. When we skip the early phases of the TALC and jump into the consumer phase, we are leaving money on the table.

But, being continuous innovations, we don’t bother with creating value chains, and careers. They get the technical enthusiasts jobs for a few months. They get some cash. The VCs get their exit. It has to be simple enough for consumers. More simplifications to come. But, the flash in the pan will vanish. Continous innovations don’t put money on the table. That money is on the floor. Bend your knees when picking it up.

Technical enthusiasts should not cheer when the technology gets simplified. Maybe they need it to get simpler, so they can use it. But, it is going to continue to get simpler. And, real science in the pre-bibliographic maturity stage will be complex or complicated. It won’t get more complicated. It will get simpler. Simper happens.

That doesn’t mean that everything has to be in the same simplicity slice. It just means that the simplicity must match the population in the phase we sell into.

One complication that doesn’t show up in the diagram is that the TALC is about carrier except in bowling alley. In the bowling alley, the carried content is what the customer is buying. But, that carried content is a technology of its own, so the carrier TALC, and the carried TALC meet in the bowling ally. Each of those technologies gets simpler at their own rates. These intersections show up in late mainstreet when you want to capture more of the business from the vertical populations. This is a real option. But, it will take quite an effort to hold on to the domain knowledgeable people.

The diagram covers much more ground. Today, we just called out the complicated and the simple.


Fourth Definition of Kurtosis

June 6, 2018

In the Wikipedia topic on Moment, Kurtosis being the fourth moment, aka forth integral of the moment generating function, Wikipedia says, “The fourth central moment is a measure of the heaviness [or lightness] of the tail of the distribution, compared to the normal distribution of the same variance.” Notice here, no mention of peakedness.

In Yes or No in the Core and Tails II, I included some discussion of mixture models with a two-dimensional graphic that illustrated the summing of two distributions. The sum (red) was Normals as Constraintssaid to have a heavy tail.  It was interesting to see distributions in a mixture model acting as constraints. I have not been able to confirm that normals in other sums act as constraints. In a mixture model, the weights of the summed normals must add up to 1, so one normal has a weight of p, and the other would have a weight of 1-p. The yellow areas represent the constrained space. The red distribution is sandwiched between the green one and the blue one. The green normal and the blue normal are constraining the red normal.

In analysis, distribution theory is not about statistics, but rather as substitutes for functions. In linear programming, constraints are functions, so it should be of no surprise that distributions act as constraints. Statistics is full of functions like the moment function. Every time you turn around there is a new function describing the distribution. Those functions serve particular purposes.

Another view of the same underlying graph shows these normals to be events on a timeline, the normal timeline. Statistics lives in fear of p-hacking, or waiting around and continuing to collect data until statistical significance is achieved. But, what if you are not doing science. P-hacking wouldn’t pay if the people doing it were trying to make some money selling product, rather than capturing grant money. Statistics takes a batch approach to frequentist statistical inference. Everything is about data sets, aka batches of data, rather than data. But, if we could move from batch to interactive, well, that would be p-hacking. If I’m putting millions on a hypothesis, I won’t be p-hacking. If I’m putting millions on a hypothesis, I won’t use a kurtotic or skewed distribution that will disappear in just a few more data points or the next dataset. That would just be money to lose.

So what is a normal timeline? When n is low, shown by the green line in the figure, labeled A, the normal is tall, skinny ideally, ideallyNormals as Timeline because it is also skewed and kurtotic which is not shown in this figure. We’ll ignore the skew and kurtosis for the moment. When n is finally high enough to be normal, shown by the red line, it is no longer tall, and not yet short. It is a standard normal. When n is higher, shown by the blue line, labeled B, the distribution is shorter and wider. So we’ve walked a Markov chain around the event of achieving normality and exceeding it. This illustrates a differential normality.

We achieve normality, then we exceed it. This is the stuff of differentials. I’ve talked about the differential geometry previously. We start out with Poisson games on the technology adoption lifecycle. These have us in a hyperbolic geometry. We pretend we are always in a Euclidean space because that is mathematically easy. But, we really are not achieving the Euclidean until our data achieves normality. Once we achieve normality, we don’t want to leave the Euclidean space, but even if we don’t, the world does, our business does. Once the sigma goes up, we find ourselves in a spherical geometry. How can so many businesses exist that sell the same given commodity in a multiplicity of ways? That’s the nature of the geodesic, the metric of spherical geometry. In a Euclidean space, there is one optimal way; in hyperbolic, less than one optimal way; and spherical, many. This is the differential geometry that ties itself to the differential normality. The differential normality that batch statistics, datasets hide. A standing question for me is whether we depart the Euclidean at one sigma or six sigma. I don’t know yet.

As a side note on mixture models like the underlying figure for the figures above, these figures show us normals that have a mean of zero, but their standard deviations differ. Sum of Normals - Different Std DevsThe first standard deviation is at the inflection point on each side of the normal distribution. The underlying figure is tricky because you would think, that all three normals intersect at the same inflection point. That might be true if all three had the same standard deviation. Since that is not the case, the inflection points will be in different places. The figure shows the inflection points on one side of the normal. When the distribution is not skewed, the inflection points on the other side of the mean are mirror images.

Mixture models can involve different distributions, not just normals. Summing is likewise not restricted to distributions having the same mean and standard deviations or being of the same kind of distributions.

Multivariable normals contain data from numerous dimensions. A single measure is tied to a single dimension. A function maps a measurement in a single dimension into another measurement in another dimension. Each variable in a multivariable normal brings its own measure, dimension, and distribution to the party. That multivariable normal sums each of those normals. Back in my statistics classes, adding normals required that they have the same mean and same standard deviation. That was long ago, longer than I think.




Yes or No in the Core and Tails II

June 4, 2018

The ambiguous middle of my decisions tree for my last post “Yes or No in the Core and Tails” has bugged me for a few days. I have a hard time thinking that I drive up to a canyon via a few roads, climb down to the river, cross the river, climb up the other side, and select one of many roads before driving off. That is not a reasonable way to deal with a decision tree that doesn’t get entirely covered by my sample space.

So what is this mess hinting at? Do not stop sampling just because you’ve achieved normality! Keep on sampling until you’ve covered the entire sample space. Figure out what power of 2 will produce a decision tree wide enough to contain the sample space, then sample the entire decision tree. Before normality is achieved, not sampling the entire base of the decision tree generates a skewed normal. This exposes you to skew risk. There will also be some excess kurtosis, which brings with it kurtosis risk.

Here is a quick table you can Binary Space vs Normal Sample Sizeuse to find the size of the sample space after you’ve found the number of samples you need to achieve normality. The sample space is a step function. Each step has two constraints.

Given that it takes less than 2048 samples to achieve a normal, that should be the maximum. 211 should be the largest binary sample space that you would need, hence the red line. We can’t get more resolution with larger sample spaces.

Note that we are talking about a binary decision in a single dimension. When the number of dimensions increases the number of nomials will increase. This means that we are summing more than one normal. We will need a Gaussian mixture model when we sum normals. The usual insistences when adding normals is that need to have the same mean and standard deviation. Well, they don’t, hence the mixture models.

I took some notes from the Bionic Turtle’s YouTube on Gaussian mixture models. Watch it here.

Gaussian Mixture Model

Back when I was challenging claims that a distribution was binomial, I wondered where the fill between the normals came from. As I watched a ton of videos last night, I realized Probability Massthat the overlapping probability masses at the base had to go somewhere. I quickly annotated a graph showing the displaced probability mass in dark orange, and the places where the probability mass went in light orange. The areas of the dark orange should sum up to the areas of light orange. The probability masses are moved by a physics.

A 3-D Gaussian mixture model is illustrated next. I noted that there are three saddle 3D Gaussian Mixture Modelpoints. They are playing three games at once or three optimizations at once.  EM Clustering is alternative to the Gaussian mixture model.

So to sum it all up, do not stop sampling just because you’ve achieved normality! 




Yes or No in the Core and Tails

June 2, 2018

Right now, I’m looking at how many data points it takes before the dataset achieves normality.  I’m using John Cooks binary outcome sample size calculator and correlating those results with z-scores. The width of the interval issue matters. The smaller the interval, the larger the sample needed to resolve a single decision.  But, once you make the interval wide enough to reduce the number of samples needed, the decision tree is wider as well. The ambiguities seem to be a constant.

A single bit decision requires a standard normal distribution with interval centered at some z-score. For the core, I centered at the mean of 0 and began with an interval between a=-0.0001 and b=+0.0001. That gives you a probability of 0.0001. It requires a sample size of 1×108, or 100,000,000. So Agile that. How many customers did you talk to? Do you have that many customers? Can you even do a hypothesis test with statistical significance on something so small? No. This is the reality of the meaninglessness of the core of a standard normal distribution.

Exploring the core, I generated the data that I plotted in the following diagram.


With intervals across the mean of zero, the sample size is asymptotic to the mean. The smallest interval required the largest sample size. As the interval gets bigger, the sample size decreases. Bits refers to the bits needed to encode the width of the interval. The sample size can also be interpreted as a binary decision tree. That is graphed as a logarithm, the Log of Binary Decisions. This grows as the sample size decreases. The more samples required to make a single binary decision is vast while the number of samples required to make a decision about subtrees requires fewer samples. You can download the Decision Widths and Sample Sizes spreadsheet.

I used this normal distribution calculator to generate the interval data. It has a nice feature that graphs the width of the intervals, which I used as the basis of the dark gray stack of widths.

In the core, we have 2048 binary decisions that we can make with a sample size of 31. We only have probability density for 1800. 248 of those 2048 decisions are empty. Put a different way, we use 211 bits or binary digits, bbbbbbbbbbb but we have don’t cares at  27, 26, 25, 24, and 23. This gives us bbbb*****bb. Where each b can be a 0 or 1. The value of the don’t cares would 0b or 1b, but their meaning would be indeterminate. The don’t cares let us optimize, but beyond that, they happen because we have a data structure, the standard normal distribution representing missing, but irrelevant data. That missing but irrelevant data still contributes to achieving a normal distribution.

My first hypothesis was that the tail would be more meaningful than the core. This did not turn out to be the case. It might be that I’m not far enough out on the tail.


Out on the tail, a single bit decision on the same interval centered at x=0.4773 requires a sample size of 36×106, or 36,000,000. The peak of the sample size is lower in the tail.  Statistical significance can be had at 144 samples.
Core vs Tail

When I graphed the log of the sample sizes for the tail and the core, they were similar, and not particularly different as I had expected.

I went back to my the core and drew a binary tree for sample size, 211 and the number of binary decisions required. The black base and initial branches of the tree reflect the being definite values, while the gray branches reflect the indefinite values or don’t cares. The dark orange components demonstrate how a complete tree requires more space than the normal. The light orange components are don’t cares of the excess space variety. While I segregated the samples from the excess space, they would be mixed in an unbiased distribution.

Decision Tree

The distribution as shown would be a uniform distribution, the data in a normal would occur with different frequencies. They would appear as leaves extending below what is now the base. Those leaves would be moved from the base leaving holes. Those holes would be filled with orange leaves.

Given the  27, 26, 25, 24, and 23, there is quite a bit of ambiguity as to how one would get from  28 branches to 22 branches of the tree. Machine learning will find them. 80’s artificial intelligence would have had problems spanning that space, that ambiguity.

So what does it mean to a product manager? First, avoid the single bit decisions because they will take too long to validate. Second, in a standard normal the data is evenly distributed, so if some number of samples occupies less than the space provided by 2x bits, they wouldn’t all be in the tail. Third, you cannot sample your way out of ambiguity. Forth, we’ve taken a frequentist approach here, you probably need to use a Bayesian approach. The Bayesian approach let you incorporate your prior knowledge into the calculations.


Kurtosis, Another Definition

May 31, 2018

Tonight, I came across a third definition of kurtosis. This definition begins at 25:30 in Statistics 101: Is My Data Normal. This source defines kurtosis as a distribution having higher than expected probability mass in the tails. Compare this to the typical definition, this one returned from a Google search, the sharpness of the peak of a frequency-distribution curve, which I’ve not used since I found kurtosis to be the curvature of the tails. See More On Skew and Kurtosis. I’m still lost as to how the kurtosis statistic translates into the curvatures of skewed distributions. Complicating the curvature issues is that in an n-dimensional normal, there are more than two tails. There does seem to be a pattern of curvatures as defining a torus for a normal without excess kurtosis or ring cyclide for a normal with excess kurtosis. The torus fits flatly on top of the tails of a normal parallel to the base plane. The ring cyclide sits on flatly on top of the tails, which is tilted in regards to the base plane.

This third definition of kurtosis is nicely quick to grasp. The typical definition seems to be confused with n, the number of data points. With little data, the normal is thin, high, and has two short tails, given the absence of skew. With a lot of data, the normal is wide, lower, and has longer tails, given the absence of skew.

I have not gotten to topological data analysis and the issues of what the torus or ring cyclide is telling us.


The Technology Adoption Lifecycle

May 24, 2018

A while back I wrote about all the so-called Chasms. These days we begin our continuous innovations in the late mainstreet. Nobody crosses the Chasm.

I was working on watching a data from a pseudorandom generator for a normal distribution converge to a normal. That is supposed to happen by the time you have 36 data points. It didn’t happen. And, it didn’t happen by the time I plotted 50 data points. It didn’t help that I had to generate more data after the first 36 data points.

I made a mistake. Each call of the generator starts the process off with a new seed, aka a new distribution, so of course, it doesn’t converge. I’m not liking this dataset mindset of statistics. I’m not p hunting. I’m trying to validate a decision made in the Agile development process. I don’t have all day, but apparently, I have a week. Claims about fast discovery turn out to be bunk. A friend of mine suggested taking a Bayesian approach instead.

Through some, now forgotten thought process, I was plotting sigmas and z-scores, et all. That brought me back to some details of the technology adoption lifecycle (TALC). So I Googled it and found a whole lot of graphs of it that were just flat out wrong. No wonder everyone is confused about the Chasm. They are using one of the revised (wrongly drawn) figures. So I’ll show you some of the figures, point out the errors, and draw an older more correct view.

The misstatements seem to be sourced from Geoffrey Moore. When he moved into the late phases when the dot bust happened, he set about making the TALC relevant to the late phases and the biz orthodoxy. He has taken back most of the claims he made in his prior version of the TALC. It’s all disappointing.

One thing Moore said back in the beginning of his TALC, not Rodger’s version, was that it was was not a clock. I always thought he meant not an asynchronous clock, aka not like email. No, what he meant was we can choose to enter any phase we want. That leaves money on the table, but it accurately reflected what businesses do. This very characteristic means that businesses can completely skip the Chasm, the bowling alley, and his first tornado. Yes, some acquiring companies skip the second tornado or just suck at it so the acquisition fails. Mostly, acquisitions don’t even try to succeed. The VCs got their exit, that being the whole point of most VC investments these days.

Once you skip over the processes that are Moore’s contribution to technology adoption, people feel free to just fall back to Rodgers, a solely sociological collection of populations. Moore took Rodgers someplace else. Yes, Rodgers didn’t see the Chasm. But, Moore didn’t see Myerson’s Poisson games. The underlying model changed over time. I’ve modified the model myself. But, Moore’s processes didn’t move.

So let’s look at the mess.

01 TALC 2018

Figures from

  4. Adoption-Lifecycle.png

I’m just citing the sources of the figures. They probably copied them from others that copied them. I’m not assigning blame. But, this very small sample demonstrates the sources of confusion about the Chasm.


  • In figures 1, 2, 3, and 5, the first phase is called “Innovators.” Well, no. The inventors happened a long time before the technology adoption lifecycle began. The word innovators are indicative of management. In the earlier texts, this population was called technical enthusiasts. They are engineers, not business people. And, in the bowling alley and vertical sense, they were programmers known to the early adopter for the given vertical.
  • In figure 2, the gray graph behind the technology adoption lifecycle has an axis labeled “Market Share.” No, in no way is a technology firm allowed to capture 100% of the market share. The maximum is 74%. After that, you have a monopoly and your business is in violation of antitrust law. The EU is probably stricter than the US. That 74% is the US threshold.
  • In figures 1, 2, 3, and 5, the second phase is called the “Early Adopters.” Under Moore’s version, this phase is more accurately called the bowling alley. It is where we sell into the vertical markets by selling to one B2B early adopter in each vertical. We would enter six verticals with a product conceived by the early adopter. That product would be built on the technology we are trying to get adopted. Products are just the means of getting the underlying technology adopted. The product visualization is the early adopter’s alone. The idea is not ours. We sell to six early adopters. This takes time. There is no hurry. We have to ensure that each of these six early adopters achieves their intended business advantage.
  • The population percentages for each phase are accurate in figure 3.
  • In figure 4, the Chasm is correctly placed, but the early adopters are to the left, aka before the Chasm, and their vertical is to the right. It is not accurate to call the entire phase where the Chasm occurs the early adopters. There is a two-degrees-of-separation network between the early adopter and their vertical. Sales reps find no particular advantage in attempting to sell to a third degree of separation. Selling to that network constitutes the central issue of the Chasm.
  • Figure 4 also splits the early and late majorities in the wrong place.
  • In figure 5, the Chasm is incorrectly placed. The Early Majority is really the horizontal, usually the IT horizontal. The Tornado sits at the entrance of this phase, the horizontal, not the Chasm. The Chasm sits at the entrance of the verticals.

One of the problems that Moore encountered was the inability of managers to know where they were in the TALC. These figures do not agree with each other, so how would managers using different versions come to agree.

I’ve made my own changes to the TALC. First, the left convergence of the normal is well after the R&D, aka science and engineering research that firms no longer engage in. The left convergence is long after the research has gained bibliographic maturity. The left convergence only happens when researchers with Ph.D.’s and master’s degrees decide to innovate after having invented. They happen long before the TALC. This doesn’t look like how we innovate these days. These days we innovate in the late phases and innovate in a scientific and engineering-free idea-driven manner with design thinking innovating around the thinnest of ideas. These early phases, the phase before the late majority start with discontinuous innovation. These days in the phases after the early majority we innovate continuously. We don’t try to change the world. We are happy to fit in and replicate as directed by the advertising-driven VCs. The VCs demand exits so quickly that we couldn’t change the world if we wanted to.

The second change was in the placement of the technical enthusiasts. They are a layer below the entire TALC. They are the market in the IT horizontal. But, they work everywhere.

The third change involves integration with my software as media model. Each phase changes its role as a media. A media has a carrier and some carried content. All software involves the stuff used to model, and the content being modeled. Artists use pens, inks, paints, bushes, and paper. Developers use hardware, software, code, … Artists deliver a message. Developers deliver a message at times more obvious than at other times.

The fourth change is my labeling the laggards as the device market and the phobics as the cloud. I do this because these populations do not want their technology use to be obvious. The phobics use technology all the time, but with deniability. They use their car, not the computer that runs the car. Task sublimation and pragmatism organize the TALC. The phobics get peak task sublimation. This is where the technology disappears completely outside of the technical enthusiast population.

Here is a revised view of the TALC that incorporates my extensions and changes.

02 Revised TALC

The end is near. The underlying technologies disappear at the convergence on the right. Then, we will need new categories that we can only build from discontinuous innovation. If you don’t read the journals, you won’t see it coming. And, if you spent your life doing continuous innovation, you won’t be able to innovate discontinuously.

Another figure out on Google correlates Gartner’s Hype Cycle with the TALC. But, this Gartner Hypecycleone is absolutely wrong. Gartner has nothing to say about technologies in the vertical. Gartner starts with the IT horizontal. If the horizontal is not the IT horizontal, Gartner has nothing to do with the TALC. The Chasm happens a long time before the Trough of Disillusionment. The Hype Cycle starts at the tornado that sits at the entry into the IT horizontal.


I’ve made the necessary adjustment in the following figure. The Hype Cycle does Gartner Hypecycle and TALC Modifiedmanifest itself in the IT Horizontal and all subsequent phases. One Hype Cycle does not cross from one TALC phase to another. Each phase has its own hype cycle. I’ve only shown the hype cycle for the IT Horizontal.

The original figure was found in a Google image search. It was sourced from

The reason I moved the Hypecycle is that in the search for clients in the vertical, IT is specifically omitted, and IT is not involved in the project. The client has to have enough pull to keep IT out. The clients would be managers of business units or functional units other than the eventual intended horizontal that you would enter in the next phase. The Chasm and the earlier adopter problems discussed relative to earlier graphics is apparent here.

The second tornado came up in Moore’s post web 1.0 work. It happens after a purchase but before integration. The VCs get their money on completion of the purchase. The acquiring company gets value from the M&A only after the integration attempt succeeds.  The AT&T acquisition of DirectTV had a very long tornado. That tornado is probably done by now. Most M&As fail. Many M&As are done solely to ensure the VCs recover their money. These are not done because the acquired company will generate a return for the acquirer. The underlying company fades into oblivion shortly after the acquisition. I’ve put both tornados in the next graphic. The timing of the M&A is independent of phase.


In most figures, the acquiring company is shown moving upwards from the M&A. That is incorrect. The acquiring company is post-peak, post early majority and is in permanent decline. The best that can happen is that the convergence on the right will be moved further to the right granting the acquirer more time before the category dies. The green area in the figure reflects the gains from a successful integration, which happens to require a successful second tornado.

What was not shown was the relation of the first tornado to an IPO that pays a premium. That only happens with discontinuous innovation, and only in the early phases of the TALC. With the innovations we do these days, we are in the late phases of the TALC, so there is no premium on the IPO.  Facebook did not get a premium on their IPO.

One aspect of today’s TALC that I have not worked out is how the stack of the IT horizontal is cannibalized by the cloud.

Back when I gave my SlideShare presentation in Seattle in 2009, a lot of people didn’t feel that the TALC was relevant. It was still relevant then. It is still relevant now. We leave much money on the table by rushing, by being where everyone else is, by quoting the leaders of the early phases while we work in the late phases. We settle for cash, instead of the economic wealth garnered by changing the world. If we set out to change the world, the TALC is the way.






Generative from Constraints, a Visualization

May 23, 2018

I came across a tweet from Antonio Gutierrez from Several constraints on a plane form a triangle. That triangle could have been a point before the constraints were loosened enough to give us some space within that triangle. More constraints would just give us a different polygon.

The loosened constraints required some room for continuous innovation. The point that became the triangle could be thought of as a “YET” opportunity of a problem that couldn’t be solved yet. But, with the triangle the opportunity awaits. So we dive in from some point of view where we can see the point at some distance. We establish a baseline from our the point of our view, the origin, to the center of the triangle. From that origin, we project three lines up to and beyond the triangle. This volume is code. At some point above the constraint plane, we take a slice through that volume of code, the blue triangle, Generative From Constraints over Time from Originand ship it. We continue to work outward. This would involve very little rework.

Alas, things change. The constraints contract (red arrows) causing us rework, or widen (green arrows) to give us space for new opportunities. The black triangle at the intersections of the constraints could widen or contract in parallel to our current boundaries (black arrows). Or, we could move our origin up or down to widen or narrow our current projection. That’s three classes of change. Each class gives us different volumes to fill.

In my game-theoretic illustrations, the release is always in a face-off with the requirements, such is the nature of design in the axiomatic sense of requirements from the carried content as assertions balanced against the enabling and disabling elements of the carrier technology. The projection doesn’t go hockey stick like into the constraints of the underlying geometry. There is always a constraint up there that’s much closer than we’d like to admit. Goldratt insists that there is always another constraint. And, in hyperbolic geometry, there is always a convergence at the nearby infinity.

In another view, the first line (red) from the origin through the center of the triangle and API - w Carrier and Carriedout into space is where we start the underlying technology. It grows outward thickening the line into a solid with the pink triangle as the base of the carrier technology. The carried content is built outward from the carrier core.

Constant change can be managed. Moving the origin down contracts the code volume. Moving G towards B contracts the code volume. Moving E towards A contracts the code volume. And, moving F towards C contracts the code. You can know before you code where rework is required and where your opportunities are to be found.

I’ve kept this simple. You can imagine that your carrier and your carried content have their own constraints, timeframes, and rates. There would be two planes, two centerlines, two triangular solids intersecting on the place representing what we will ship. We could slip in a plane to project onto and out from. Oh, well.



Holes II

May 8, 2018

This week I revisited fractional calculus. A few months ago, someone on twitter tweeted a link to a book on fractional calculus. I didn’t get far. My computer crashed, so I lost my browser tabs. I didn’t reload them, because I had so many the browser was slowly doing its job, which apparently is collecting vast numbers of tabs of readme wannabes.

The topic came up again. I’m not sure the original link got me to the Chalkdust article, or if I had to Google it. The content was less complete, and not historical at all. But, you come away with two methods of getting the job done.

The article ended with a graphic that blew me away when I look at it from the perspective of discontinuous innovation. The discontinuity is large. It went on to hide, you might say, another discontinuity. I’m always asked what discontinuities are. I try never to make the mathematical answer to that question. The Wright brothers were not math equations.

Fractional CalculusSo here is the figure from the article. Do you see the discontinuities? The first one is glaring if you’re always looking for and needing discontinuities. Much like the discontinuities that the Mittag-Lefler Theorem, discussed in my last post, Holes,  lets us generate one or more discontinuities are essential to discontinuous innovation. There is profit in those holes. They are profit beyond the cash plays of continuous innovation, the profit of economic wealth that accumulates to the whole, the “we,” not just to the “me.” They are profit in the sense of new value chains, new careers, and revised ways to do jobs to be done.

Fractional Calculus - DiscontinuityI marked the figure up to uncover the discontinuities. We can start with the plane ABCD. The plane is outlined with a thin blue line containing the red surface from which the differentiation process departs. I drew some thick red lines to outline the hole where the process lifts the differentiation process above the plane.

There is a shadow that is visible through the front surface of the process. It was visible in the original graph. Highlighting it hides it. The thin orange lines highlight that surface.

D8 and D9 do not intersect. The third dimension lets them slide by each other without intersecting. When confronted with an intersection of constraints, look for a dimension that separates them, or look for a geometry that separates them. As product managers, we just have to look for the mathematicians and scientists that separate them. Product has always been about breaking or bending a constraint. Here we broke one. It looks like all we did was bend a constraint as of yet.

The hole is on the floor of the atrium, not on the canvas comprising the surface of the tent.  I drew a line parallel to the y-axis and put a hole on it so we could see the discontinuity. It’s not a hole that is a point. It is an area, an area on the plane. I drew a gray line across the plane to characterize the hole on that line. These scan lines don’t have to be parallel or orthogonal to the x-axis, but a polar or complex space would not simplify what we are doing here.

Everything under the surface of the graph and above the original plane is the hole. Another plane would characterize the hole differently.

That’s the first discontinuity.

Having read the article, I know that fractional derivatives involve deriving and then adding an approximation of the fractional component, or deriving past the integer power and subtracting the fractional component. In integer calculus, it’s all about functions until you get to a constant, a number. And, when you get a constant of zero, you’re done. There is a wall there. There is a hole on the other side of that wall into which no mathematics I know goes to take a swim. Yes, the differentials can be negative. We call that process integration. But, the switch between analysis and the approximation by the Gamma function is significant as is the switch between analysis and number theory.

I drew an axis above the graph in the sense of derivatives only omitting integration and projected the boundaries between equations, numbers, and zero. At zero, the zero deflects integration when zero is a number, rather than a function with the value of zero. It’s a gate. When that zero is the value of a function, integration passes unimpeded into the negative differential region.

Most of the time the “Does not exist” answer to the equation just means that we don’t know the math yet. Yes, we cannot divide by zero until calculus class, then we divide by zero all the time. The Mittag-Lefler theorem welcomes us to put holes where we need them. The mathematics is simpler without holes, so mathematicians sought to get rid of them. But, as product managers, we need our holes, if as product managers you are commercializing discontinuous innovation.

On our plane, point D at the far left where we’ve gone to number. The second hole is to the left of the orange line I projected up to our function-number axis. I don’t yet know what’s on the other side of line. Now, I’ll have to go there.