Followup on The Dance of a Normal

January 15, 2018

When I wrote the post, The Dance of a Normal: Data Quantity and Dimensionality I didn’t tie it back into product management. My bad. I’ll do that here. I was reminded that I needed to get that done by John D. Cook’s post on his blog post, “Big data is not enough.”

When we construct a normal from scratch, we need 20 data points before we can estimate the normal, and then make any inferences with that normal. That’s 20 data points of the same measurement in the same dimension. If that measurement involves data fusion and we change that fusion, we have a different measurement, so we need to segregate the data. If we change a policy or procedure, those changes will change our numbers, or basis. If we change our pragmatism slice, those numbers will change. If we had enough data, each of those change would be an alternative hypothesis of its own. Hopefully, they would intersect each other so we could test each of those hypotheses for correlation.  But, we can’t just aggregate them and expect to make valid conclusions even if we now have 80 data points and a normal.

With those 20 data points, we have a histogram. We will also have kurtosis when we tell our tools to generate a normal with those 20 data points. We will have to check to see how many nomials we have. Each nominal will have a mean, median, and mode of its own. Those medians lean. Those medians remain the statistic of centrality while the mode and mean move out into the skew.

While you can estimate a normal from 20 data points, don’t expect it to be the answer. There is more work to be done. There is more logic involved. There is more Agile development to do. Don’t move on to the next thing until you have 36 data point for that dimension. If you release some new code, start the data point count over. This implies slack.

When I was managing projects, the mean would converge. When you see the same mean several days in a row, you’ve converged. Throw the data our and collect new data. Once the data converges, it is hard to move the number. Your performance might have changed, but the number hasn’t. Things hide in averages.

Beware of dimensions. A unit of measure could be more than one dimensional when it’s used in different measurements. What is the logic of this sensor versus another? What is the logic of the illuminator? What is the logic of the mathematics? Are we assuming things? A change in any of that brings us to a new dimension. Write down the definition of each dimension.

The statistics for each dimension and each measurement takes time to reach validity. The rush to production, to release, to iteration leaves us with much invalidity until we reach validity. The numbers your analytics kick out won’t clue you in. Kurtosis can give you a hint if it is not swamped. Slow down.

Once you have achieved normality with a measurement, how many sigmas do you have: 1, 3, 6, >6, 60? At three, your underlying geometry changes from Euclidean to spherical. Your business will change when your sigma is greater than six. You will have more competition and the number of fast followers will explode.

Adding data points will change the normal, which in turn changes the outliers. This will be even more the case when you attend to the changes to your dimensions and measures, and your TALC phases and pragmatism slices. The carried and carrier will have their own dimensions and measures. They will also have different priorities, and levels of effort. When moving from a carried layer to a carrier layer, the outliers would be different, because the carrier and carrier have their own normal distributions each with their own dimensions and measures.  The emphasis changes, so the statistics change. The populations across the stack differ widely.

So much mess can be made with metrics. Gaps in the data happen. The past hangs around to assert itself in the future. When you drive down a road, adjacent houses can be from different decades. Data is likewise. The infrastructure helps eliminate gaps and the miss-allocation of data. It’s not as simple as a measure to manage, you have to manage to measure.




Burst-and-Coast Swiming

January 13, 2018

Twitter brought this article to my attention:

 “In contrast with previous experimental works, we find that both attraction and alignment behaviors control the reaction of fish to a neighbor. We then exploit these results to build a model of spontaneous burst-and-coast swimming and interactions of fish, with all parameters being estimated or directly measured from experiments.”


“We disentangle, quantify, and model the interactions involved in the control and coordination of burst-and-coast swimming in the fish Hemigrammus rhodostomus. We find that the interactions of fish with the arena-wall result in avoidance behavior and those with a neighbor result in a combination of attraction and alignment behaviors whose effects depend on the distance and the relative position and orientation to that neighbor. Then we show that a model entirely based on experimental data confirms that the combination of these individual-level interactions quantitatively reproduces the dynamics of swimming of a single fish, the coordinated motion in groups of two fish, and the consequences of interactions on their spatial distribution.”

“Disentangling and modeling interactions in fish with burst-and-coast swimming reveal distinct alignment and attraction behaviors,” Daniel S. Calovi,

So what does this have to do with product management? It boils down to the technology adoption lifecycle (TALC). It basically describes organizing behavior, the organizing behavior of clients, customers, users, and markets.

Burst-and-coast swimming happens in the buy. An initial sale is a big effort on the part of the buyer and seller. Back when we sold software, the initial sale generated a large commission for the sales rep, and the subsequent upgrade sales generated smaller commissions–burst-and-coast. Then, came the install, burst-and-coast.  Once we get our own software installed, we use it and hope with never have to deal with the guts of the application ever again. Well, that tells you, I’m not a hacker or a developer. If the effort is too high, I bail. Sorry, my bad.

Attraction and alignment behaviors control the reaction of a vendor to neighboring vendors. And, the customers do likewise. Once you get that first B2B early adopter in the first pragmatism slice, the client, and start selling to the first degree of separation prospects in the adjacent pragmatism slice, you see this burst-and-coast behavior. In the market, the followers follow the leader, the peak predator. The vendor and the vendor value-chain members do the same. Even fast followers don’t get ahead of the leader. The leader pays a price to lead. They own the burst. The fast follower doesn’t have the capabilities it needs to be the leader. The pragmatism slices are the arena walls. Address only the next pragmatism slice, not the current one, or the past ones. The pragmatism slices are not random.

For each adjacent slice and subsequent adjacencies, the business model must convince, so even the arguments, the explanations burst-and-coast.   During that initial client engagement we build the client’s product visualization with our underlying technology, then in preparation for the chasm crossing, we build the first business case, and we have to help ensure the client achieves that business case. The chasm crossing is one of those arena walls sitting between two pragmatism slices that is problematic enough to warrant elevation to a TALC phase. We have plenty of time to ensure the business case. We are constantly addressing that business case.  “Software By Numbers,” describes how the client engagement should proceed. With each release, we have to convince the funding early-adopter client to fund the next release. Each release has to make the case for the next release. We cannot deliver the whole thing all at once. With Agile, we make the case at the level of each feature making its own case. So we have a school of code that heads off in a single direction effortlessly or a school of developers or a school of vendors and value chains. But, the convincing case for the client is not persuasion. It’s an obvious pathway in the functionality that takes the client to their value proposition by enhancing their competitive position, their place in the larger school of fish.

In the B2B early adopter phase, we are focused on the carried content. We have to build the carrier functionality at the same time, but that is in the background. The first business case is specific to the client and their role in the industrial classification tree, their specific subtree, no higher, lower, or wider. Care must be taken to keep the subtree small in the beginning. Stage-gate on the subtree. How much is enough? The big picture is hyperbolically away, and looking deceptively, and unsustainably small. You’re ten years away, this is no overnight unicorn. But, back to the fish, when you deliver to two channels: the carrier, and the carried, you have two schools of fish. They don’t swim together. There are arena walls.

In the next phase, the vertical market, there are more fish and more pragmatism slices. Sales will be random. Sales will ignore those arena walls. They are chasing money, not product evolution, subtree focus, or the future. Yes, we must pay for today and tomorrow, but outliers are costly and not an opportunity to swamp all the other considerations. Having one big customer is a bad, but an attractive looking proposition. The vertical itself is an opportunity. Plenty of companies started, did business, and exited within this phase never moving to the next phase. Companies lived comfortably in the vertical. It’s their school of fish. They go together.

Preparing for the next phase, the horizontal, or the IT horizontal requires us to shift from carrier to carried. Preparation should have started years or months ago. Worse, this phase will be about aggregating all the companies written on top of the same underlying technology. Vertical products will be rewritten as templates on a single carrier. In the vertical, the carrier should have stayed unified and similar. It’s a different architecture. The school of fish the customers will be much larger and wider, but the customer is now the IT department. The previous customer was the non-IT, business unit executive.

Yes, this is not what we do today. These days everything is continuous, small, short-term financed, exiting soon, not changing the world stuff. But, it is also, not what they did yesterday either. I’m writing from the full-TALC course, not entering late-market only, aka starting in the middle, or near the end. The problems we face today from globalism won’t be solved with the innovations we do these days, the continuous stuff; the science and engineering-free, management only innovation. So start something that will last 50 plus years that starts at the beginning and exploits every phase of the TALC until it ends up in the cloud.

Discontinuous innovations give rise to entirely new value chains, new careers, unimagined futures, and unaddressed sociological problems that we have not addressed during the youth of the software age. We are older now. We are more orthodox now. Yes, that business orthodoxy is a school of fish. We used to swim outside those fish, but they have joined us because our venture funding still works, while there, the banks, don’t make loans like they used to. The school of fish that are banks has moved on, so the orthodoxy sees the innovators as prey, and we apparently agree. We have not pushed back and said, hey this donut shop is not an innovation. But, we are being taught a buzzworded definition of innovation.

Anyway, you grasp what I’m calling the burst and coast, the never before though and the commodity, the innovator and the orthodoxy. Many fish of many species all self-organizing and structured in difficult ways to see. We will meet many along the way.

And, one last thing, read widely. The last thing that will teach us anything these days is the innovation press. Always ask yourself what can this teach me regardless of what you’re focused on these days.



The Dance of a Normal: Data Quantity and Dimensionality

January 10, 2018

Last night I read a post on John Cook’s blog, “Formal methods let you explore the corners.” In it, he mentioned how in a sphere with high dimensionality, most of the mass is in the corners. He put a circle in a box to illustrate his point.

Last week there was a figure illustrating this. As the sphere gained additional dimensions, it becomes more cube-like.  Given a normal distribution looks like a circle when viewed from the top down, I drew what a high dimension normal would look like as it moved from n=1, to n>36, and dimensionality moved from 0 to high dimensionality. Of course, I threw in a few other concepts, so the figure moved from this goal to more. That’s drawing to think.

Increases in data points and dimensionality

Low and High Dimension Normals

The figure on the left provides a top-down view of the normal, as a circle, as n increases, and dimensionality increases.

At n=1, the distribution starts off with a Dirac function exploding off to infinity. The height of the line representing the Dirac function contains the entire probability mass of the distribution. Given the usual z-score representation, a line by itself can’t be a probability because we need an interval. The second data point will show up quick enough to put an end to that quandary. Exhale. The point representing the line of the Dirac function is in the center of the concentric circles under the wider purple line.

A few more data points arrive. At n=5, we have a cluster around the black point in the center of the concentric circles. Here some of the probability mass has flowed down to the x-axis and outward into the distribution. That distribution is not normal yet. These data points would present a Poisson distribution or a set of histogram bars on a line. Here that line would be a curve. But, the data points would be curved. This cluster is shown with darker gray.

At n=20, the Poisson distribution would tend to the normal. This normal is comprised of a core and tails. These are concentric circles centered at the mean, the black point in the center, the point representing the line of the former Dirac function. The core is shown as a lighter gray circle; the tail, the lightest gray. As the number of data points increases, the width of the core and the tails grow.

As the number of data points grows, the normal distribution loses height and the probability mass that comprised that height moves outward in the core and the tails. Black arrows to the right of the mean show this outward movement. The circles representing the core and the tails get wider. Once, there are 36 data points, the width, and height of the normal stabilize. As more data points are added, Not much will change.

All of these changes in width and height were relative to a low number of dimensions. When you have less than 36 data points, the distribution would be skewed. This is ignored in the above figure. But, each dimension would be skewed initially and become normal as data points for that dimension are added. This figure is drawn from the perspective of a normal with more than 36 data points, hence no skew. Skew would appear in a top-down view as an ellipse.

Consider each dimension as having its own normal. Those normals are added together as we go. I do not know where the threshold between low dimension and high dimension normals would be. The high dimension normal footprint is shown as a rounded off square, or a squared off circle. It is pink. The corners get sharper as the number of dimensions increase. A black, double-sided arrow indicates the boundary between low dimension and high dimension footprints.

I used a light blue circle to demonstrate how the density in the high dimensional normal is not even. When a tail ends up in the corner, it is longer and the circle tangent to the normal curve is bigger. When a tail ends up on the side, it is shorter and the circle tangent to the normal curve is smaller. These black circles and ellipses represent intrinsic curvatures, or kurtosis, each given by the inverse of their radius, of the tails.

The normal we are used to viewing is a two-dimensional slice through the mean, so we have two tails. In a three-dimensional normal, we can rotate the slicing plane through the mean and get another two tails. With the standard normal, all the slices would look the same. The tails would be the same. The circles representing the intrinsic curvatures would be the same. But, when the normal is skewed, the slices would differ.  The tails would differ with one side being longer than the other. The circles representing the intrinsic curvatures would differ as well. The shorter tail would give us a smaller, tighter circle. The longer tail would give us a larger, looser circle.

If we rotated our slicing around the normal through the mean, in the high dimensionality situation, we would see the tails being the same on both sides, but each slice would have tails of different length. In the low dimensionality situation, the tails would be the same all in all slices.

The intrinsic curvatures are shown in black on the left side of the normal. I’ve put red spheres in inside those curvatures to hint at the topological object, the aggregate of those spheres, shown with the thick red lines, laying on top of the normal.

The pink footprint meets the light blue circle at the rounded off corners of the footprint of the high dimensional normal but diverges at the sides. There is no probability mass at the sides as it flowed into or was pushed into the distribution envelope suggesting higher densities inside the distribution along the sides. The light blue arrows indicate this.

The corners have the longest tails. The sides have the shortest tails. Given that the slices made by the planes slicing the distribution through the mean are symmetric, the tails are the same on both sides of the mean.

 Black Swans and Flow of Probability Density

In the center figure, I showed the usual side view of the normal. I drew two pink lines to show where the high dimensional footprint ended. The high dimensional footprint has less width except at the corners, so those pink lines, so rotating the high dimensional normal relative to the low dimensional normal would move those pink lines. This reflects a risk mechanism similar to skew risk and kurtosis risk.

I projected those pink lines. Superimposing the low and high dimensional normals presents us with two black swans if we go with the x-axis, or two shorter tails if we go with the x’-axis.  The two black swans appear as cliffs, the horizontal lines between the x and the x’ axes. The length of those lines represents the thickness of the bit loss. The tail volumes were lost between the pink lines and the outer gray lines labeled Prior Tail. The blue rectangles beside the distribution indicate where the tail volumes were lost. Where tail volumes were lost, we renormalize the distribution. In a high dimensional normal, these volumes would be small. These volumes contain probability masses in low dimensional normals. In high dimensional normals, all the probability densities are on the surface of the distribution.

In the center figure, I’ve used thin, light blue arrow to clarify the flow of probability density from the Dirac function into the normal.

Intrinsic Curvature

The figure on the right illustrates with a side view of the normal: the effects of skew, and the presence of a torus or, more accurately, a ring cyclied. I first discussed this ring cyclied in The Curvature Donut.

The purple line in the figure on the left represents the x-axis of the horizontal view of the normal. On the figure on the right, the purple line is the x-axis. I used a standard normal but added the circles representing the intrinsic curvatures in red. Since the standard normal is symmetric, both of the outer intrinsic curvatures are the same size. This symmetric situation gives us a torus topologically. The torus sits flatly on top of the tails of the normal. This is the high dimension and the high number of data point cases. Then, I hinted at a skewed distribution, aka the low number of data point cases, with the angled line of the median and a short tail. That short tail would have a smaller circle representing its intrinsic curvature. This gives us a ring cyclied topologically. The ring cyclied sits tilted on the tail of the normal.

I then superimposed the smaller circle and larger one from the skewed situation. The smaller circle one represents maximal curvature; the larger circle, the minimal curvature.  Then, with the black circle, I averaged the two. So I could get down to one kurtosis value. Kurtosis is one number. You might tell me that it represents the kurtosis of the standard normal, but skew is tied to kurtosis, so there should be two numbers since their number would not be equal, but this average business is just my guess. I still don’t see the height of the distribution a being indicated by kurtosis. Still wondering.

Kurtosis as peakedness is stated all over the literature as the ground truth, but a few authors and I say that that doesn’t make sense. The third-moment, the calculus definition of kurtosis has nothing to do with peakedness.

The x and x’ axes here is accidental. The black swans show up as well. Again accidental.

I left a comment about multidimensional normals in John Cook’s blog. He replied while I was writing this. I will have to think about it a while, and I may have to revise this. See his Willie Sutton and multivariate normal distribution.

As always enjoy.








December 10, 2017

Pragmatism organizes the technology adoption lifecycle (TALC). While the TALC is usually represented by the normal distribution summed into the normal we use to summarize what’s going on. We see the phases, the larger scale pragmatism outcomes. Not the smaller scale pragmatism outcomes within the phases, the pragmatism slices.

To begin in the beginning, we illuminate when we don’t have a sensor that can detect a signal. Otherwise, we go straight to the sensor, which gives us data in some range. We might have to clean it up. For a normal distribution or a Poisson distribution, we count up how often a value occurred, or the arrivals of values.

Eventually, we end up with a distribution or an envelope for randomness. That distribution houses the “noise.” We captured the data points. We summarize the data points into parameters that determine the shape of the distribution we are using to summarize our data. We make a standard normal with just two parameters: the mean, and the standard deviation. With three pairs of numbers, we have the three normals of the TALC covered.

The TALC is a system built on noise. Yes, sorry, but sales is a random process. Marketing likes to think of itself as a methodical organization. Marketing discovers prospects, nurtures prospects, uncovers the buying process and the participants in the buy, and once the nurturing process moves all of those participants into the “I want this,” state, they set the appointment for the sales rep. Then, sales throws that lead in the trash.

While marketing was busy with all of that, sales picked up the phone and random walked themselves to revenue. And, finally, having sold, management tells the sales rep that they can’t do the deal because the prospect is an outlier. Just another day in the war between marketing and sales.

The TALC is anything but random. The TALC is a highly organized stochastic system. It’s like a radar. A radar sends out noise in a given distribution, a physical one. Only the frequencies that fit in the pipe make it to the antenna where they are transmitted. Then, they bounce off stuff and get back to the antenna where they again have to fit in the pipe. Outliers are trashed. In a company, that outlier prospect moves the population mean too far at too high a cost, so the company refuses to sell to them right now. A few years from now that too far at too high a cost problem was so yesterday.

Marketing already knew that. But, marketing is not random. Marketing has to be pragmatic when it faces a population organized by pragmatism. All that population wants is a business case that makes the buy reasonable. Reasonable is the real organizer. Jones bought this and got a hell of a success from it. But, you know us, we are not like Jones at all. Jones is an early adopter. We wait, not long, but we wait. We want to see the successes of businesses like ours. Jones is too early for our tastes. Just like sales is too early.

That the TALC is based on a summed set of normal distributions doesn’t help either. Those normals make this a stochastic system. The prospects do a random walk towards up. And, we do a random walk to out qualified prospects. “Qualified” filters those prospects. But, so does pragmatism.

I read across the “Markov Chains: Why Walk When You Can Flow?” blog post on the Elements of Evolutionary Anthropology blog. Twitter random walks all of us. This post is about random walks.

The author started with an application demonstrating a random walk under a normal distribution. He shows the next attempted step in the random walk with a vector that is either red for failure or green for success. When the vector is green, the next step is taken, which results in a new data point being added to the distribution. When the vector is red, the data point is not added to the distribution, and another step is attempted.

Random Walk Metropolis-Hastings Normal

I annotated the author’s figure to show where the outliers sit, the Markov chain underlying the Metropolis-Hastings random walk and the TALC phases.


Random Walk Metropolis-Hastings Normal Annotated

On the y-axis normal, I indicated where the data generated by the random walk are either over or under the expected frequencies. Then, I added a hypothetical path via the green vectors. I colored the outliers in gold, but later I realized that there were more outliers beyond the six sigmas of the normal representing the talk. I used the red circle to divide the additional outliers from the non-outlier tail of the normal.

Then, I labeled the TALC. That labeling might be unfamiliar. From the left, EA is the early adopter; C is the Chasm; V is A vertical market. The bowling alley (BA) is comprised of the early adopter and their vertical. The Chasm guards entry into the vertical. The technical enthusiasts are present across the TALC, not just at the beginning, so they have their layer. Their layer included the cloud form-factor (C) as part of the technical enthusiast layer. This population was formerly considered to be phobics (P) or non-adopters, but the disappearance of the technology and admin-free/infrastructural, aka somebody else’s problem presentation fits the needs of phobics. Then starting at the right again after the vertical phase, at the tornado (T), enter the early mainstreet (EM), otherwise called the horizontal (H) or IT horizontal phase. Next, we enter the late mainstreet (LM), otherwise called the consumer phase. We exit the late mainstreet one of three ways: the M&Athrough a second tornado (T), or by moving through or to the form factors of the device (D) phase, and the cloud (C) phase. NA here means non-adopter.

We may extend the life of the category by going down market. The gray outermost circle represents the extent of the down market move.  This is where Christensen disruptions live, in the down market. They live elsewhere as well, but all of them are firmly anchored in the late mainstreet or consumer phase. Foster disruptions require discontinuous invention and innovation prior to the technical enthusiast phase.

I further illustrated progress through the TALC with thick red and blue arrows. Discontinuous innovations need the full pathway starting with the technical enthusiasts (TE) phase. Continuous innovation can start anywhere. These days it is typical to be in late mainstreet (LM) leaving a lot of money on the table, but the VCs investing there only know that phase, so they do not reap the returns that paid for everyone else. Cash is the game in the late mainstreet. B-schools preach the late mainstreet with its steady long-term commodities and the sport of competition.

The extent of the downmarket is shown with the light blue horizontal lines and the angled line that denotes the end of the category. The line the company going downmarket ends up on depends on how far downmarket they went. The end of the category depends on the extent of the downmarket move as well.

The author talks about the efficiency of the next step in the Markov path and how one explores only the areas under the normal that need to be explored. So his next figure takes a random walk around a narrow ring under the normal.

Random Walk Metropolis-Hastings Ring

In this figure, you see one phase of the TALC being rotated around under the normal. This would be the technical enthusiasts in their phase and the phobic or cloud phase. We find the next data point less often, less frequently, but the frequency of a given data point would be the same if a normal was used, but the overall process is faster when the area being explored is smaller.

Random Walk Metropolis-Hastings Ring Annotated

So the math works out t be A=π(R^2-r^2) vs A=πR^2, which means that the ring does not take as long to compute. But, in a stochastic system, the random number generator knows nothing of rings, so many numbers get generated and disposed of unused.  Smaller targets are harder to hit.
I annotated this one as well. There is a lot going on in that ring.
Random Walk Metropolis-Hastings Ring Annotated
The normal distribution in the ring is a circular normal. With a non-circular normal, the normal would be skewed until the density was consistent throughout the ring. That the distribution is not normal across the entire topology leaves us with skew and kurtosis. For the time being the distribution is trinominal. And, those uninominal are interspersed with Poisson arrivals that eventually tend to and achieve a normal. Those Poisson distributions occur in the still empty areas of the ring.
Again, I’ve color coded the areas under the distribution used as being over and under the frequencies intended by the distribution being used, the target distribution.A Pragmatism Slice
This figure shows us what a pragmatism slice looks like. But, in the TALC, we haven’t gone far enough in defining the target area yet.
A Pragmatism Slice 2
Here I went back to the TALC and focused on the technical enthusiast in the beginning of the TALC (TE) and those last two phases beyond the Late Mainstreet (LM), as in phobic (P) and laggard (L) or the device (D) and cloud (C) phases. There are real differences in mission between the early and late phases. There are real differences between outcomes, as an IPO premium for early phases, and no such thing for late phases. The early TEs play with the technology. The late TEs migrate the product to the new form factors. The late TEs might have to develop a product for the company that eventually acquires the TE’s company. Macromedia developed Captiva to this end. So these different times are looking for very different target populations.
Each pragmatism ring serves different roles in the software as media model. Early TEs play with the carrier. Late TEs play with different form factors, different carriers. Late TEs also distribute components differently as well. Each phase has different expectations and different levels of task sublimation. Task sublimation would be counter to the need of those in the early phases, yet essential to those in the late phases. The generic “task sublimation is good” finding is not so good as a generic piece of advice. Likewise design, or the notion that dot 1.0 functionality was awful. No, it wasn’t awful. It served geeks just fine. We didn’t ask developers to respect the carried domain, and really, we still don’t. Observation and asking questions is insufficient for what needs to be achieved.
The functionality problem is still with us, unsolved. Hiring UX developers still leave the non-UX developers to code their functionality as they please. They still don’t do UX.
Those Poisson games played during the search for the next technology, those Poisson distributions show up throughout the TALC any time when we don’t have a valid sample or a normal free of skew and kurtosis.
Attend to your pragmatism slices. Don’t jump ahead then jump back. You moved your normal. They don’t go back well. Ask the next slice, the prospect slice, about what they need. Do this independent of your install base, your customers. Those prospects will need something different from your customers. You might as well have different lists for each of those slices. The carrier and the carried slices would be different as well. The carried and carrier code really can’t be written by the same developers. The disciplines being coded are too different. The carrier is easier than the carried. In general, we mess carried up. We know carrier. That’s where most developers live.
Way back in the nascent internet days, a developer was all hot to write an electronic store, but when I asked him if he had ever worked in a store, he said no. He was enamored with the carrier and thought the carried would be easy. Sorry, but stores have managers that live stores. A database developer lives databases. A database can be a nice metaphor for a store, but that poetry, not a store.
Enjoy your pragmatism slices. Don’t turn them into onions.
And, click the link and read the blog post. I haven’t read the whole thing yet.



From Time Series to Machine Learning

December 4, 2017

This post, “Notes and Thoughts on Clustering,” on the Ayasdi blog brought me back to some reading I had done a few weeks ago about clustering. It was my kind of thing. I took a time series view of the process. Another post on the same blog, “The Trust Challenge–Why Explainable AI is NOT Enough,” boils down to knowing why the machine learning application leaned what it did, and where it went wrong. Or, to make it simpler, why did the weights change. Those weights change over time, hence the involvement of time series. Clustering changes, likewise, in various ways as n, n as time, changes, again time series is involved.

Time is what blew those supposed random mortgage packages up. The mortgages were temporally tied linked, not random. That was the problem.

In old 80’s style expert systems, the heuristics were mathematics, so for most of us the rules, the knowledge was not transparent to the users. When you built one, you could test it and read it. It couldn’t explain itself, but you could or someone could. This situation fit rules 34006 and 32,***. This is what we cannot do today. The learning is statistical, but not so transparent, not even to itself. ML cannot explain why it learned what it did. So now there is an effort to get ML to explain itself.

Lately, I’ve been looking at time series in ordinary statistics. When you have less than 36 data points the normal is a bad representation. The standard deviations expand and contract depending on where the next data point is. And, the same data point moves the mean. Then, there is skew and kurtosis. In finance class, there is skew risk and kurtosis risk. I don’t see statistics as necessarily a snapshot thing, only done once you have a mass of data. Acquiring a customer happens one customer at a time in the early days of a discontinuous innovation in the bowling alley. We just didn’t have the computing power in the past to animate distributions over time or by each data point. We were asked to shift to the Poisson distribution until we were normal. That works very well because the underlying geometry is hyperbolic explaining why investors won’t put money on those innovations. The projects into the future get smaller and smaller the further out you go. The geometry hides the win.

It turns out there is much to see. See the “Moving Mean” section in the “Normals” post for a normal shifting from n=1 to n=4. Much changes from one data point to the next.

I haven’t demonstrated how clustering changes from one data point to the next. I’ll do that now.

Clustering DP1

At n=1, we have the first data point, DP1. DP1 is the first center of the first cluster, C1. The radius would be the default radius before any iterating that radius to some eventual diameter. It might be that the radius is close to the data point or at r=1.

At the next data point, DP2, it could have the same value as DP1. If so, the cluster will not move. It will remain stationary. The density of the cluster would go up. But, the standard deviation would be undefined.

Or, DP2 would be different from DP1 so the cluster will move and the radius might change. A cluster can handily contain three data points. Don’t expect to have more than one cluster with less than four data points.

Clustering DP2

At n=2, both data points would be in the first cluster. Both could be on the perimeter of the circle. The initial radius would be used before that radius would be iterated. With two points, the data points might sit on the circle at the widest width, which implies that they sit on a line acting as the diameter of the circle, or they could be closer together closer to the poles of the circle or sphere. C2 would be a calculated point, CP2 between the two data points, DP1 and DP2. The center of the cluster moves from C1 to C2, also labeled as moving from DP1 to CP2. The radius did not change. Both data points are on a diameter of the circle, which means they are as far apart as possible.

The first cluster, CL1, is erased. The purple arrow indicates the succession of clusters, from cluster CL1 centered at C1 to cluster CL2 centered at C2.

P1 is the perimeter of cluster CL1. P2 is the perimeter of cluster CL2. It takes a radius and a center to define a cluster. I’ve indicted a hierarchy, a data fusion, with a tree defining each cluster.

With two data points the center, C2 and CP2, would be at the intersection of the lines representing the means of the relevant dimensions. And, there would be a standard deviation for each dimension in the cluster.

New data points inside the cluster can be ignored. The center and radius of the cluster do not need to change to accommodate these subsequent data points. The statistics describing the cluster might change.

A new data point inside the cluster might be on the perimeter of the circle/sphere/cluster. Or, that data point could be made to be on the perimeter by moving the center and enlarging the radius of the cluster.

The new data point inside the cluster could break the cluster into two clusters both with the same radius. That radius could be smaller than the original cluster. Overlapping clusters are to be avoided. All clusters are supposed to have the same radius. In the n=3, situation, one cluster would contain one data point, and a second cluster would contain two data points.

A new data point outside the current cluster would increase the radius of the cluster or divide into two clusters. Again, both clusters would have the same radius. That radius might be smaller than the original cluster.

Clustering DP3

With n=3, the center of the new cluster, C3, is located at CP3. CP3 would be on the perimeter of the cluster formerly associated with the first data point, DP1. The purple arrows indicate the overall movement of the centers. The purple numbers indicate the sequence of the arrows/vectors. We measure radius 3 from the perimeter of the third cluster and associate that with CP3, the computed center point of the third cluster, CL3.

Notice that the first cluster no longer exists and was erased, but remains in the illustration in outline form. The data point DP1 of the first cluster and the meta-data associated with that point are still relevant. The second cluster has been superseded as well but was retained in the illustration to show the direction of movement. The second cluster retains its original coloring.

Throughout this sequence of illustrations, I’ve indicated that the definition of distance is left to a metric function in each frame of the sequence. These days, I think of distributions prior to the normal as operating in hyperbolic space; at the normal, the underlying space becomes Euclidean; and beyond the normal, the underlying space becomes spherical. I’m not that deep into clustering yet, but n drives much.

Data points DP1 and DP2 did not move when the cluster moved to include DP3. This does not seem possible unless DP1 and DP2 were not on a diameter of the second cluster. I just don’t have the tools to verify this one way or another.

The distance between the original cluster and the second was large. The distance is much smaller between the second and third clusters.

This is the process, in general, that is used to cluster those large datasets and their snapshot view. Real clustering is very iterative and calculation intensive. Try to do your analysis with data that is normal. Test for normalcy.

When I got to the fourth data point, our single cluster got divided into two clusters. I ran of time revising that figure to present the next clusters in another frame of our annimation. I’ll revise the post at a later date.

More to the point an animated view is a part of achieving transparency in machine learning. I wouldn’t have enjoyed trying to see the effects of throwing one more assertion into Prolog and trying to figure out what it concluded after that.




November 27, 2017

Unit of Measure

Back in an earlier post, A Quick Viz, Long Days, I was wondering if the separate areas on a graphic were caused by the raster graphics package I was using, or if they were real. If a pixel is your unit of measure, then the discontinuities are real. The unit of measure drives the data. So yes, those disconnected areas would be Poisson distributions tending Unit of Measureto the normal and the units of measurement get smaller.

In this figure, I changed the unit of measure used to measure the top shape. I increase the size of the unit square moving down the page. Then, for each of the measured shapes, I counted complete units, used Excel to give me a moving mean and standard distribution with time (n) moving left to right on each figure. In the first, measurement I generated a histogram of the black numbers below the shape.

A graph of the moving averages appears above each shape in gray. A graph of the moving sigmas appears above each shape in black. This helps us see the maximum or minimum sigmas and means. It also reveals uninominal to multinominal structure, or how many normals are involved. In all cases, the means were uninominal involving a single normal. The results from the smallest pixel show that the sigma was binominal. The middle pixel resulted in three sigmas as the distribution was trinominal. The largest pixel resulted in a uninominal. In all three cases, the shape generated skewed distributions.

No time series windows were used.

Where the data was smaller than a pixel, it is highlighted in red and omitted from the pixel counts. You can see how the data was reduced each time the pixel size went up. The grid imposing the pixelizations were not applied in a standard way. We did not have an average when the grids were applied. The red pixels could be counted with Poisson distributions. They are waiting to trend to the normal. Or, they could be features waiting for validation. In a discontinuous innovation portfolio, they could be lanes in the bowling alley waiting for their client’s period of exclusion to expire, or waiting to cross the chasm. Continuous innovations do not cross Moore’s chasm. Continuous innovations might face scale chasms or downmarket moves via disruption or otherwise. All of these things impede progress through the customer base. They would be red. Do you count them or not.

Grids have size problems just like histogram bins.

A Moving Mean

When you first start collecting data each data point changes the normal massively. We hide this by using a large amount of data after the fact, rather than like a time series building out a normal towards the standard normal, or a Poisson distribution and increasing the number of data points until the normal is achieved.

When watching a normal go from 1 to n, it matters where the next data point comes from. If the data point is the third or more, it will be inside or outside the core, or, as an outlier, outside the distribution entirely. In the core, an area defined by being plus or minus one sigma, one standard deviation from the mean, the density goes up, the sigma might shrink. That sigma won’t get wider. Outside the core, in the tail, the sigma might get wider. The sigma won’t get narrower. These would change the circumference of the circle representing the footprint of the normal. An outlier makes the normal wider. That outlier would definitely move the mean.

So what is the big deal about moving the mean? It moves the core. It’s only data. No. That normal resulted from the sum of all the processes and policies of the company. A population makes demands of the company and the product. When the core moves, some capabilities are no longer needed, some attitudes are no longer acceptable. On the financial side of the house, skew risk and kurtosis risk are real. When the core moves, the tails move. The further the core moves, the further the tail moves in the direction of the outlier.

Sales is a random process. Marketing is not. We don’t much notice this when we are selling commodity goods, but with a discontinuous innovation, that outlier sale has many costs that we have never experienced. The technology adoption lifecycle is only random when you pick where you start, your initial position, in the middle and work towards the death of the category. Picking the late mainstream phase because it’s all you know, leaves a lot of money on the table and rushes that population to the buy before the business case they need to see is ready to be seen. But, picking late mainstream also means you’re fast following. Don’t worry. The innovation press will still call your company innovative. Hell, yours is purple and the market leader’s version is brown.

But, let’s say you began in the beginning and through the early phases coming out of the tornado as the market leader. You will have gone from a Poisson distribution to the three sigma normal to the six, to the twelve, to more. Your normal will dance around before it sets its anchor at the mean and stays put while it grows outward in sigmas.

That outlier that sales demands and we refused eventually will be reached. Sales just got ahead of itself and cost the company quite a bit trying to build the capabilities the outlier takes for granted.

I sat down with a spreadsheet and sold one customer, built the normal, and sold another, built another normal. That first customer was narrow and very tall. It’s as tall as that normal will ever be. It looks like a Dirac function. Of course, there is no standard deviation when you have a single data point. I fudged the normal by giving it a standard deviation of one. And, the standard normal looks like any other standard normal. Only the measurement scales changed from one normal to the next. The normals get lower and wider as the population gets larger.

I did this without a spreadsheet, but I got normals with a Normal Distribution N eq 1kurtosis value, but no skew or kurtosis are produced by those standard normal generators. So this first figure is the first data point. It may be a few weeks until the next sale. Or, this might be a developer’s view of some functionality that certainly hasn’t been validated yet. Internal agilists never dealt with this problem. The unit measure is a standard deviation, a sigma.

Normal Distribution N eq 2 and 3

In the figure above, DP1 is the first data point and the first mean. So I went on to the next data point.

Here, in the figure above, the distribution for the second data point, DP2, is the gold one. The standard deviation was 13. The mean for the gold distribution is represented by the blue line extending to the peak of the gold distribution. The black vertical lines extending upwards to the gold distribution demark the core of the gold normal. In the top-down view, the normal and its core are shown as black circles. With a standard deviation of 13, three standard deviations are 39 units wide.

The next data point, the third data point,  DP3 gives us the third mean.  This mean is shown as a red line extending to the top of the pink distribution. In the top-down view, this normal and its core are shown as red circles. Notice that the height of this normal is lower than that of the gold normal. Also notice that this new data point is inside the core of the previous normal, so this normal contracts. With a standard deviation of 11, three standard deviation is 33 units wide. The third mean moved, so there is some movement of the distribution.
Horizontally and Vertically Correct

The figure above is illustrative but wrong. The vertical scale is off. So I rescaled the normals generated for the second and third data points. And, a fourth data point was added as an outlier. No normal was generated for it. That would be the next thing to do in this exploration.

The black arrows at the foot of the gold normal show the probability mass flowing into the pink normal. The white area is shared by both distributions.

Where I labeled the mean, median, and mode is the same is not real either. The distribution is not normal. I tried to draw skewed distribution show with the numbers from the spreadsheet. Eventually, I left that to the spreadsheet. In a skewed distribution all three numbers separate. The mean is closest to the tail.

In the top-down view, the outer circle is associated with the outlier.

The means moved from 5 to 18 to 20, and to 34 in response to the addition of the outlier at 75. The footprint of the normal expands with the addition of the outlier, and contracts in response to the addition of the third data point at 24.

The distribution is like gelatin.

Now, I got out the spreadsheet. I built a histogram and then put the line graph of a normal over it. The line graph doesn’t look normal at all.

Histogram w normal

So I took the normal off.

Histogram wo norml

This showed three peaks. Which drove the normal to show us a trinomial that was right or positively skewed. This data has a long way to go before it is really normal. When I tried to hand draw the distribution, it looked left or negatively skewed. Adding the outlier cause this.

No, I’m not going to add another data point and keep on going. I’ll wait until I get my programmer to automate this animation. I did try to get a blog up for our new company, but WordPress has not gotten easier to use since the last time I set up a blog. Anyway, they told us in statistics class that the normal wouldn’t stabilize below 36 data points. We looked at this. Use a Poisson distribution instead. Set some policy about how many data points you have to have before you call a question answered.

Hypothesis Testing over timeIn Agile, the developer wants to get to validation as quickly as possible. Using the distributions at n = 2 and n = 3, we can look test a hypothesis. We will test at n = 3 (now) and n = 3 -1 = 2 (previous). Since n =3 contracted, we could accept H1 previously and no longer accept H1 now.

I did not compensate for the skew in the original situation. The top-down view shows that with skew rejecting a hypothesis depends on direction. In our situation, the mean only moved to the right or the left. With another axis, the future distribution could move up or down, so there is, even more, sensitivity to skew and kurtosis. And, these sensitivities are financial risks. Sales to outliers translate into skew and kurtosis. These sales can also be costly in terms of, again, the cost of the capabilities needed to service the account.

Beware of subsets. With any given subset, that subset will likewise need 36 or more data points before the normal stabilizes. Skew risk and kurtosis risk will be realized otherwise.


Upmarket and Downmarket

November 4, 2017

A while back I ran across a developer coding for the upmarket. It took me a while to recall what an upmarket move was. Geez. And, when you’re talking upmarket, there is a down market. I don’t think in those terms since they are late main street and the horizontal phase issues. Not my game.


I decided to look at them from the standpoint of the technology adoption lifecycle, so I drew two figures to take a look at them.

Market Definition--Down Market

I drew the downmarket case starting with the technology adoption lifecycle (TALC) as a normal of normals. The company is in the late mainstreet phase. This is usually where a company builds a downmarket strategy. Companies in this phase are on the decline side of the TALC. Growth really a matter of consuming the market faster and reaching the end of the road, the death of the category sooner. Growth is a stock market trick. Going downmarket is a way to grow by actually increase the size of the population that the company is facing.

I labeled the baseline of the TALC “Former. ” Then, I drew another line under the TALC. This line should be long enough to contain the population that the company is moving downmarket to capture. I labeled this line “Planned.” Then, I drew a standard normal to sit on this new line extending from the original normal.  I did not normalize the new normal.

The current market is a subset of the new down-marketed market. The new market need not be centered at the mean of the current market. The population will be new so the mean and standard deviation could differ. The standard normal view of the TALC assumes a symmetrical distribution. This need not be the case. Having two means do make a mess of the statistics. It might not look like a binomial. It will exhibit some kurtosis. The speed of the efforts separating the means will take time and planning. If the company is public, it must provide guidance before making such efforts. Don’t switch before providing those projections to the investors.

I went with have one mean in the figure.

The downmarket effort starts with a making the decision. The decision will require some infrastructural changes to the marketing and sales efforts at a minimum. It will also require some UX and code revisions to give the downmarket user relevant interfaces. Simple things become much harder when the user doesn’t have the funds they need. The cognitive model may differ from that of the upmarket. These problems may or may not be an issue with your software. The decision might be made across products, particularly in a company organized around their bowling alley. That could mean that this downmarket might be a permanent element across all products.

After some period of time, the decision to move downmarket will become operational. Sales may continue in the current markets as other sales efforts address the new downmarket or the current market might be deemphasized or delayed. I removed it. I color coded the lost earnings in yellow and notated it with a negative sign (-). I color coded the gained earnings in green and notated it with a positive sign (+). The gained earnings are dwarfed by the lost earnings as the scale of the market grows and subsequently hits the first scale constraint. Then, the downmarket move will stop until the current population and projected population can be supported. Efforts to support the increase in scale can start earlier before the scale constraint generates a crisis.

Beyond the first scale constraint, the gains begin to drown the losses. Then, the next scale constraint kicks in. Once again the downmarket move will stop until the infrastructure can support the needs being generated by the downmarket move.

Beyond the second scale constraint, the losses dry up and the gains continue out until the convergence of the normal with the x-axis happens, aka the death of the category. Another managerial action will need to be taken to further extend the life of the category.

Notice that I moved the baseline downward beyond the second scale constraint. I labeled this “Overshoot.” I did this to make the losses look continuous. Initially, the curve sat on the original downmarket baseline, but this gave a sawtooth-shaped curve. I’m unsure at the time of this writing which representation is better. As shown, the convergence with the baseline of the normal shows up on the “Overshoot” line.

Pricing will drive the speed of the downmarket realization. Pricing might impair the downmarket move. The net result of the downmarket move will be an increase in seats, which turns into an increase in eyeballs, financial results will depend on price, policies, and timeframes, and an extension of the life of the category.


In the TALC, we usually start in the upmarket and work our way to the downmarket as we move from early (left) to late (right) phases, from growth to decline. Hardly ever does a company move upmarket after being a lower priced commodity.

Market Definition--Up Market

Here I started with the TALC again. I selected a target population, a smaller population, and drew a horizontal above which would represent the upmarket. The upmarket as a horizontal slice across the normal is shown in yellow and gold. Renormalizing that gets us the green and orange normals. The purple arrow behind the normals provides an operational view as sales grow the eventual standard normal shown in orange. The zeros convey how the market is not growing. The higher prices of an upmarket might shrink the size of the market.

When converting an existing market to a higher price, we can consider the market to be Poisson, eventually a kurtotic normal shown with the gray normals, and finally a standard normal without kurtosis. The figure skips the Poisson distribution and begins with the kurtotic normal. Normals with small populations are taller. They shrink towards the standard normal. When a normal is kurtotic it exhibits a slant which disappears as the kurtosis goes away.

I called all of these changes in the size, shape, and slant of the normal the “Price Dance.” This dance is illustrated with the purple arrows. Once the standard normal is achieved, kurtosis risk is removed. As the standard normal gains sigmas, the risk is reduced further.

The Poisson distribution representing the initial sales at the higher price puts the product back in hyperbolic space. Once the single sigma, standard normal is achieved, the product is in Euclidean space. From the single-sigma standard norm, the sigmas increase. That puts the product in spherical space where the degrees of freedom of strategy and tactics increase making many winning strategies possible. In the hyperbolic space, those degrees of freedom are less than one. Euclidean space has a single degree of freedom. This implies that the Euclidean space is transitory.

The net result of the upmarket move will be an increase in revenues depending on pricing, The number of seats will remain constant with optimal pricing, which in turns leaves eyeballs unchanged. Upmarket moves shorten the life of the category.


Downmarket moves take a lot of work, more work than an upmarket move. In both cases, the marketing communications will change. Upmarket moves get you more dollars per seat, but you would have to be selling the product. The number of seats does not change or falls with an upmarket more. Downmarket moves get you more seats, more eyeballs, and given pricing, more revenues if any are independent revenues from eyeballs. Downmarket moves extend the life of the category/product/company. Upmarket moves shorten those lives.

Downmarket and upmarket moves are orthodox strategies and tactics. Talk with your CFO. I’d rather keep the lanes of my bowling ally full.


A Quick Viz, Long Days

October 29, 2017

Three days ago, out on Twitter, a peep tweeted a graph that was supposed to show how a market event amounted to nothing. The line graph dropped the baseline, rose above the 0 Net Zerobaseline, and dropped again to the baseline. It was a quick thing that had me spending the rest of the day, and parts of the following three days hammering on it.

The peeps point was that nothing happened. Grab a hammer and join me in building a case showing just how much did happen.

This was their graph. If you’re in a hurry, you won’t notice the net loss.

I rotated the minima so I could see if the loss was completely recovered. It was not. The 1vertical symmetry is asymmetric. Rotating the minima reveals a gap, labeled A, shows that the upside did not completely recover the value lost in during the first downside.

The second downside loss stops at the line labeled B, the new baseline. There is a gap between the initial baseline and the final baseline. The gap between the baselines is larger than the gap between the peaks. I coped the gap between the peeks and put it below the initial baseline to demonstrate that loss at A did not account for all the loss between the baselines. Subtracting the loss A from the loss between the baselines gives us the gap labeled B.

Notice that the baseline at B moves up slightly. I just saw this after drawing many diagrams. I annotate my error. We will ignore this slight upside. Just one more thing that the peep and I overlooked. I will remove it from subsequent diagrams.

Going back to the first diagram, we had a downside, an upside, and another downside. The first downside (A) and the second downside (B) account for the difference between the initial and final baselines.

2In the figure on the right, I explored the symmetries. The vertical red lines represent the events embedded in the signal. The notation for the symmetry for an event n, span the interval from n-1 to n+1. These spans are shown in gray.

Since I rotated the minima, the symmetry above the signal is actually a vertical (y-axis) symmetry around the origin. I drew purple lines from the vertex at the top to the vertexes at the baseline. Then, I moved the purple lines to the top of the figure. They looked symmetric, but are slightly asymmetric. The left side was three units wide; the right, four units wide.

Both of the horizontal (x-axis) symmetries are asymmetric. The gray box notation demonstrates that these signal components are very asymmetric.

Asymmetries indicate locations where something was learned or forgotten. The Glass-Steagall Act often gets cited as one of the causes of the housing crisis. It was a forgetting. In Stewart Brand’s “How Buildings Learn,” they learned by accretion. We accret synapses as we learn. When we put a picture on a wall, the wall learns about our preferences. The next resident may not pull that nail out, so such remodeling artifacts accret. Our house becomes our home, because we teach our house, and our house learns. So it is with evolution.

Before I created the box notation, I was drawing the upside and downside lines and 3rotating them to see how much area was involved in each of the asymmetries. I’m using the rotation approach in the figure to the left. I’ve annotated the three asymmetries, The white areas are cores, and the orange areas are tails. The asymmetry annotated at the top of the figure is, again, horizontal. The tail is just a line as the asymmetry is slight. The cores are symmetric about vertical lines, not shown, that represent the events encoded into the signal.

In an earlier figure, I just estimated the area of the tail. When I highlighted that area, 4because I use MS Paint to draw these things and it dithers, I got a line of green areas, rather than a single area. I numbered them in order. They are labeled as Area Discontinuities. In a sense, they would be Poisson distributions in individual Poisson games. In area 8, those Poisson distributions become a single normal distribution. That normal has more than 32 data points. With 20 data points, that normal can be estimated. In a sense, there is a line through those Poissons and the normal. This is what happens in the technology adoption lifecycle as we move from early adopters each with their own Poisson game and sum towards the vertical/domain specific market f which the early adopter is a member. This line is one lane of Moore’s bowling alley.

Where the figure mentions “Slower,” that is just about the slope of that last diagonal, the second loss. The red numbers refer to the earlier unrefined gaps we are now calling A and B.

When there are tails, the normal distribution involved will exhibit kurtosis. I built a histogram of the data in the area that I highlighted in green and then, looked at the underlying distribution along the line through those areas. There seemed to be two tails: one thicker and one thinner. Of course, all of this is meaningless, as it results from the dithering. With a vector rendering, there would only be one more consistent area.

The tiny thumbnail in the middle of the thumbnails at the bottom right of the figure shows a negatively skewed normal, but in another interpretation, the distribution is four separate normals. Where I mentioned theta, the associated angle quantifies the kurtosis5

One more thing is happening where a Poisson distribution finally becomes a normal distribution, the geometry shifts from hyperbolic to Euclidean.




In the next figure, I look at the black swan view of the signal. A black swan is usually 6drawn as a vertical line cutting off the tail of the normal distribution, labeled Original and highlighted with yellow and light green. Here we are talking generally. The next figure we will use this to show how the three black swans generate the signal that we’ve been discussing. The negative black swan throws away the portion of the distribution remaining beyond the event driving the black swan, then the remaining data is used to renormalize the remaining subset of the original data. The lifetime of the category is reduced. The convergence with the x-axis contracts, aka moves towards the y-axis. The positive black swan moves the distribution down. The normal becomes enlarged, so it sits on the new x-axis below the original baseline. The new distribution includes the light green and green areas in the figure. The lifetime of the category is lengthened. The convergence moves out into the future, aka moves further away from the y-axis.

In the continuous innovation case, the positive black swan will stay aligned with the driving event. The normal distribution is enlarged just enough to converge with the new x-axis below the prior x-axis. In the discontinuous innovation case, the positive black swan would begin at the B2B early adopter phase of the technology adoption lifecycle. In the discontinuous case, the size of the addressable market would drive the size of the normal, and it is not correlated with the prior distribution.

Now we go back to the example we’ve worked on throughout this post. We will apply the black swan concepts to the signal using the diagram below. There are three black swans. A negative black swan that generates the first loss. A positive black swan follows with a recovery that does not fully recover the value lost in that first loss. This recovery is followed by another negative black swan that contributes to the net loss summed up by the signal. The normals are numbered 0 through 3. The numbers are to the right of the events, and they are on the baseline of the associated normal. The original distribution (0) is located at the event driving the first black swan. The new distribution (1) associated with the first loss, the first negative black swan. The x-axis of this black swan is raised above the original x-axis. This distribution lost the projected data to the right of the event, data expected from the future. Renormalizing the distribution makes it higher from peak to the new baseline, and the distribution contracts horizontally. The rightmost convergence of the normal with the x-axis is where the category ends. The leftmost convergence is fixed. The x-axis represents time. The end of the category will arrive sooner unless some other means to generate revenues is found, aka a continuous innovation is found. The first gain, aka the positive black swan, generates a larger distribution (3). The x-axis is lower than that of the immediately prior x-axis. The convergence moves into the future relative to the immediately prior distribution. This is followed by another loss, the second loss, the second negative black swan. Here the x-axis rises above the previous x-axis. The distribution (3) is renormalized and is smaller than the immediately previous distribution (2).

From a signal perspective, the original signal input was above the output. The black swans move the signal to the line labeled “Restatement.” The shape of the original and restatement generate and output the same signal.


Next, we look at the logic underlying the signal. I’ll use the triangle model. In that model, every line is generated by a decision tree represented by a triangle. The x-axis has decisions trees, aka triangles associated with it. Each interval on the x-axis has its own decision tree. The y-axis has its own intervals and decision trees. The events that drove the black swan model drive the intervals and associated decision trees.


The pink triangles represent the y-axis decision trees involved in the losses. The green triangle represents the y-axis decision tree for the gain.  The green triangle is higher than the gain, because it does not recover the entire loss from the first loss. I annotated the shortfall. The asymmetry in the vertical axis, that we discussed earlier, appears on the upper right side of the triangle is thicker. This thickness is not constant. The colors and the numbers show the patterns involved on that side of the triangle. The axis of symmetry associated with the green triangle is an average between the baseline of the input signal and the baseline of the output signal. Putting this symmetry axis would increase the asymmetry of the representation.

The erosion would be shown more accurately as subtrees, rather than a single subtree starting at the vertex, like a slice of pie.

On the x-axis, each triangle is shown in blue. The leftmost triangle consists of a blue triangle and yellow triangle. The blue triangle represents the construction of the infrastructure that generates that interval of the signal. The yellow triangle represents the erosion that infrastructure. The black sway, the first lost resulted from that erosion.

Keep in mind that the negative black swan reduces the probability, so they move their baselines up vertically. Positive black swans increase the probability, so they move their baselines down vertically.

In the very first figure, I annotated the asymmetries and symmetries. Asymmetries are very important because they inform us that learning is necessary. Asymmetries in the normal distribution show up as kurtosis due to samples being too small to achieve kurtosis-free normality or symmetry.

The vertical orientation of those pink triangles is new to me as I wrote this. They represent the infrastructure to stop loss, a reactive action. The results may appear positive, but in the long run, represents exposure. These actions will be instanced for the situation being faced. Given that a black swan can happen at any moment, you don’t want to have to invent a response. You want to move from reactive, predictive, proactive time orientations as quickly as possible. Many people see OODA loops as a reactive mechanism. The military trains on the stuff, on the infrastructure–decision trees being part of that infrastructure. Know before you go. Eliminate or reduce those asymmetries before you get into the field, before the black swan shows up.

The events in the original signal view ties to the black swan/distribution view and the logical view are tied together by the red lines representing the events.


I drew another figure that is a bit cleaner about the signal view.  The


Even if the signal looks like nothing, a net zero, take a closer look, there was much to be seen, much learning got done to produce the result. Know before you go.





The Mortgage Crisis

September 5, 2017

Last week, I came across another repetition of what passes for an explanation of the mortgage crisis. It claimed that the problem was the propensity of low-quality loans. Sorry, but no. I’m tired of hearing it.

A mortgage package combines loans of all qualities, of all risks. But, being an entity relying on stochastic processes, it must be random. Unfortunately, those mortgage packages were not random. This is the real failing of those mortgage packages. Mortgages happen over time and are temporally organized, as in not random.

The housing boom was great for bankers up to the point where they ran out of high-quality loans. At that point, the mortgage industry looked around for ways to make lower quality loans. Mortgage packages gave them the means. So fifty loans got sold in a given week, the lender packaged them into one package. Some of those loans were refinancing loans on high-quality borrowers. Rolling other debts into the instrument improved the borrower’s credit but didn’t do much for the mortgage package. Still, the averages worked out, otherwise, throw a few of the pre-mortgage packaging loans, high-quality loans, in there to improve the numbers. A few people had to make payments to their new mortgage holding company. Their problem.

But, the real risk was that all of the original fifty loans originated from the same week. They were temporally organized. That breached the definition of the underlying necessities of stochastic systems. That was the part of the iceberg that nobody could see. That;s the explanation that should be endlessly retweeted on Twitter.

Why? Well, we no longer living in a production economy. You can make money without production. You can make money from the volatility economy. You can make money off of puts and calls and packages of those. That allows you to make money off of your own failures to run a successful business. Just hedge. The volatility economy is a multitude of collections of volatility based on a stochastic system, the stock market.  And, with the wrong lessons having been learned about mortgage packages, the regulators want to regulate mortgage packages and other stochastic systems. Or, just make them flat our illegal because they didn’t know how to regulate them. I’m not against regulation. Constraints create wealth. I just see the need for stochastic systems.

Too many stories are wrong, yet, endlessly repeated on twitter. Kodack, …. 3M, …. There was only one writer that wrote about Kodak that understood the real story. With 3M, their innovation story was long past and still being told when the new CEO gutted the much-cited program.

From the product manager view, where do stochastic systems fit in? The bowling alley is a risk package akin to a mortgage package. But, if you are an “innovative” company much-cited in the innovation press these days, don’t worry, your innovation is continuous. The only innovations showing up in the bowling alley are discontinuous. Likewise, crossing the chasm, as originally defined by Moore, was for discontinuous innovations. Those other chasms are matters of scale, rather than the behavior of pragmatism slices.

But, back on point, we engage in stochastic systems even beyond the bowling alley. A UI control has a use frequency. When they have a bug, that use-frequency changes. Use itself is a finite entity unless you work at making your users stay in your functionality longer. All of that boiling down to probabilities. So we have a stochastic system on our hands. In some cases, we even have a volatility economy on our hands.


A Different View of the TALC Geometries

August 25, 2017

I’ve been trying to convey some intuition about why we underestimate the value of discontinuous innovation. The numbers are always small, so small that the standard financial analysis results in a no go decision, a decision not to invest. That standard spreadsheet analysis is done in L2, a Euclidean space. This analysis gets done while the innovation is in hyperbolic space so the underestimation of value would be the normal outcome.

In hyperbolic space, infinity is away at the edge at a distance. In hyperbolic space, the unit measure appears smaller at infinity when viewed from Euclidean space. This can be seen in a hyperbolic tiling. But, we need to keep something in mind here and throughout Hyperboic Tilingthis discussion, the areas of the circle are the same in Euclidean space. The transform, the projection into hyperbolic space makes it seem otherwise. That L2 financial analysis assumes Euclidean space while the underlying space is hyperbolic, where small does not mean small.

How many innovations, discontinuous ones, have been killed off by this projection? Uncountably many discontinuous innovations have died at the hands of small numbers. Few put those inventions through the stage-gated innovation process because the numbers were small. The inventors that used different stage gates pushed on without worrying about the eventual numbers succeeded wildly. But, these days, the VCs insist on the orthodox analysis, typical of the consumer commodity markets, that nobody hits one out of the ballpark and pays for the rest. The VCs hardly invest at all and insist on the immediate installation of the orthodoxy. This leads us to stasis and much replication of likes.

I see these geometry changes as smooth just as I see the Poisson to normal to high sigma normals as smooth. I haven’t read about differential geometry, but I know it exists. Yet, there is no such thing as differential statistics. We are stuck in data. We can use Monte Carlo Markov Chains (MCMC) to generate data to fit some hypothetical distribution from which we would build something to fit and test fitness towards that hypothetical distribution. But, in sampling that would be unethical or frowned upon. Then again, I’m not a statistician, so it just seems that way to me.

I discussed geometry change in Geometry and numerous other posts. But, in hunting up things for this post, I ran across this figure. Geometry Evolution I usually looked at the two-dimensional view of the underlying geometries. So this three-dimensional view is interesting. Resize each geometry as necessary and put them inside each other. The smallest would be the hyperbolic geometry. The largest geometry, the end containment would be the spherical geometry. That would express the geometries differentially in the order that they would occur in the technology adoption lifecycle (TALC) working from the inside out. Risk diminishes in this order as well.

Geometry Evolution w TALC

In the above figure, I’ve correlated the TALC with the geometries. I’ve left the technical enthusiasts where Moore put them, rather than in my underlying infrastructural layer below the x-axis. I’ve omitted much of Moore’s TALC elements focusing on those placing the geometries. The early adopters are part of their vertical. Each early adopter owns their hyperbola, shown in black, and seeds the Euclidean of their vertical, shown in red, or normal of the vertical (not shown).  There would be six early adopter/verticals rather than just the two I’ve drawn. The thick black line represents the aggregation of the verticals needed before one enters the tornado, a narrow phase at the beginning of the horizontal. The center of the Euclidean cylinder is the mean of the aggregate normal representing the entire TALC, aka category born by that particular TALC. The early phases of the TALC occur before the mean of the TALC. The late phases start immediately after the mean of the talk.

The Euclidean shown is the nascent seed of the eventual spherical. Where the Euclidean is realized is at a sigma of one. I used to say six, but I’ll go with one for now. Once the sigma is larger than one, the geometry is spherical and tending to more so as the sigmas increase.

From the risk point of view, it is said that innovation is risky. Sure discontinuous innovation (hyperbolic) has more risk than continuous (Euclidean) and commodity continuous (spherical) less risk. Quantifying risk, the hyperbolic geometry gives us an evolution towards a singular success. That singular success takes us to the Euclidean geometry. Further data collection takes us to the higher sigma normals, the spherical space of multiple pathways to numerous successes. The latter, the replications, being hardly risky at all.


Nesting these geometries reveal gaps (-) and surplusses (+).





The Donut/Torus Again

In an earlier post, I characterized the overlap of distributions used in statistical inference as a donut, as a torus, and later as a ring cyclide. I looked at a figure that Torus_Positive_and_negative_curvaturedescribed a torus as having positive and negative curvature.


So the torus exhibits all three geometries. Those geometries transition through the Euclidean.Torus 2

The underlying distributions lay on the torus as well. The standard normal has a sigma of one. The commodity normal has a sigma greater than one. The saddle and peaks refer to components of a hyperbolic saddle. The statistical process proceeds from the Poisson to the standard normal to the commodity normal. On a torus, the saddle points and peaks are concurrent and highly parallel.

Torus 3