Archive for January, 2018

Skewed Normal

January 28, 2018

As a baseline, we’ll start with a top-down view of a normal distribution. The typical view is a side view. In the top-down view, the normal is the center of some concentric circles. In our graph, the concentric circles will have radiuses defined in terms of the statistical unit of measure, standard deviations. I’ve shown circles at 0,1,2, 3, and 5 standard distributions. The mean is shown at 0 standard distributions.

The core of the distribution is shown in orange. The horizontal view of the distribution defines the core as being between the inflection points (IP) of the normal curve. The core in a normal is the cylinder from the plane of the inflection points to the base of the distribution. The horizontal view can be rotated to align with the plane cutting the distribution for a particular dimension shown here as D1, D2, and D3. We only have three dimensions in this normal. With n dimensions, there would be n slices. With the normal, as long as the distribution is sliced through the mean, all the 2D projections would look the same. The normal is a symmetric distribution.

NormalWith a normal distribution, the mean, median, and mode have the same value. This and being symmetric is a property of the normal. More specifically, a non-skewed normal. A standard normal is not skewed.

The normal can be estimated with a Poisson distribution of at least twenty data points. The Poisson distribution will tend to the normal between twenty and thirty-six data points.

The normal is usually used in the snapshot dataset perspective, rather than in a time series sense. But, the time series sense is significant when you wonder if you’ve collected enough data. A dataset should be tested for normality. It is usually assumed. The tests for normality are weak.

Once the data achieve normality, the data tends to stay normal. The core and the outliers won’t move, and the standard deviation will stay the same. Until the data achieve normality, the distribution moves and resizes itself.

In the technology adoption lifecycle, the vertical phase will be the first time the normal is achieved. It will be a normal for the carried content. The horizontal, aka the early mainstreet market, has its own normal. The horizontal’s normal is for carrier components. The late market, likewise, has its own normal. In the late market, the focus is on carrier with the earlier carried discipline representing mass customization opportunities. The laggard and phobic phases are about form factors, carriers. Carried content may change is these phases. Carried content provides an opportunity to extend category life.

Preceding normality, the normal is skewed. In the next figure, I’ve put the skewed normal above the non-skewed normal.

Skewed Normal

Where the normal has a circular footprint, the skewed normal has an elliptical footprint. The median does not move. It tilts. This pushes the mode and the mean apart symmetrically around the median. The blue arrow shows how much the median tilts. The thick blue line shows the side view of the skewed normal. The core is shown in light orange. The tails are significant in the skewed normal. The skewed normal is asymmetrical. More on this later.

Each ellipse corresponds to the sigmas of our earlier diagram. But, the circular areas are the future. I’ve marked the outliers relative to the circular footprint of the non-skewed normal. The area I’m calling the deep outlier, the dark yellow population, is beyond what would be considered in the non-skewed normal. It would definitely be an error to collect data from that population, or since we sell to populations as we collect data from that population, it would be an error to sell to that segment of the population. Even after normality is achieved, outliers are more expensive than the revenues generated from that population.

The yellow populations are outliers, but they are outliers to the non-skewed normal. These outliers are shared by both distributions. The light green and even lighter green areas represent non-outlier populations that will be sold in the later normal, or as we sell to achieve normality. As the skewed normal achieves non-skewed normality, the ellipses will become circles. The edges located along the x-axis will move to the right. The tilted median will stand up vertically until it is perpendicular, and the mode and mean will converge to the median.

The ellipses would be thinner than shown. The probability mass under both distributions equals one so the ellipse would be less wide vertically than the circles. I had no idea about how wide those ellipses would be, but the figure is definitely wrong.

The skewed distribution exhibits kurtosis. I disagree with the idea that kurtosis has anything to do with peakedness. Other statisticians made this argument to me. The calculus view of the third moment disagrees as well. Kurtosis is about the tails and the shoulders as they relate to the cores. Some discussions ignore the shoulders. In this figure, I’ve included shoulders. I’ve used thick red lines and red text to highlight the components of the normal (N) and the skewed normal (SN). The normal only has one set of components. The skewed normal has two sets of components: one on the left, and another on the right.

I highlighted the shoulder of the normal. I highlighted the right and left shoulders of the skewed normal. And, lastly, I highlighted the right and left tails of the skewed normals.

The shoulders and tails are related to the cores. The normal core is a circle. The light orange ellipse of the skewed normal sits on top of it. I labeled both cores. The purple rectangle above the cores is the core of the skewed normal. The black core is the core of the non-skewed normal.

Kurtosis defines the curvature (κ) of the tails. I usually show these as circles defined as  κ=1/r. These circles are tangents to the tails of the normal. In a normal, these circles are the same size on for both tails. In a skewed normal the circles are vastly different in size. These circles in both cases generate a topological object: A torus for the normal, and a ring cyclide for the skewed normal. These topological objects are generated as we rotate 360 degrees around the median or mean of the normal. I showed this topological object in dark orange. In this figure, I showed them as ellipses. The circular version made the diagram very large. The ellipse for the ring cyclide on the left side is large. On the right, it is very small. This is due to the horizontal slice through the 3D objects. The xy-plane used to produce the slice through both objects. Both objects are smooth and continuous so another slice through the median would show a smaller circle on the left and a larger circle on the right. At some rotational angle, both circles would be the same, as in both curvatures would be equal. The thick vertical line through the median turns out to be the slice in which both curvatures would be the same. This curvature would be the average curvature.

When I put the left portion of the torus in the figure, the blue line representing the side-view of the normal was incorrectly drawn. The peak should have been at the mode. This was the second surprise. The median has more frequency, but it is tilted at an angle, an angle that makes it less high than the mode. The mode being the highest was one of those not yet know pieces of knowledge.

I’ll attempt a multimodal normal with opposing long tails. I was going to try to illustrate a such a normal. There can be a multiplicity of centrality tuples, skews and long tails. With the tools I used now, that would be a challenge.

I’m looking at the Cauchy distribution now. There is no convergence. But, Cauchy sequences converge based on ε. You can pick your convergences. A footprint would be zeros. Different values of ε would different footprints, and different conclusions of the underlying logical argument in the triangle model sense of the width and depth of a conclusion.

The first thing that surprised me in this post was how a portion of the outliers, the deep outliers, of the skewed normal is too far away from my market. And, how other portions of the outliers are outliers in both distributions. Another example of writing to think, rather than writing to communicate. Sorry about that.

Care must be taken to ensure this if you are going to market to outliers. I won’t.





Followup on The Dance of a Normal

January 15, 2018

When I wrote the post, The Dance of a Normal: Data Quantity and Dimensionality I didn’t tie it back into product management. My bad. I’ll do that here. I was reminded that I needed to get that done by John D. Cook’s post on his blog post, “Big data is not enough.”

When we construct a normal from scratch, we need 20 data points before we can estimate the normal, and then make any inferences with that normal. That’s 20 data points of the same measurement in the same dimension. If that measurement involves data fusion and we change that fusion, we have a different measurement, so we need to segregate the data. If we change a policy or procedure, those changes will change our numbers, or basis. If we change our pragmatism slice, those numbers will change. If we had enough data, each of those change would be an alternative hypothesis of its own. Hopefully, they would intersect each other so we could test each of those hypotheses for correlation.  But, we can’t just aggregate them and expect to make valid conclusions even if we now have 80 data points and a normal.

With those 20 data points, we have a histogram. We will also have kurtosis when we tell our tools to generate a normal with those 20 data points. We will have to check to see how many nomials we have. Each nominal will have a mean, median, and mode of its own. Those medians lean. Those medians remain the statistic of centrality while the mode and mean move out into the skew.

While you can estimate a normal from 20 data points, don’t expect it to be the answer. There is more work to be done. There is more logic involved. There is more Agile development to do. Don’t move on to the next thing until you have 36 data point for that dimension. If you release some new code, start the data point count over. This implies slack.

When I was managing projects, the mean would converge. When you see the same mean several days in a row, you’ve converged. Throw the data our and collect new data. Once the data converges, it is hard to move the number. Your performance might have changed, but the number hasn’t. Things hide in averages.

Beware of dimensions. A unit of measure could be more than one dimensional when it’s used in different measurements. What is the logic of this sensor versus another? What is the logic of the illuminator? What is the logic of the mathematics? Are we assuming things? A change in any of that brings us to a new dimension. Write down the definition of each dimension.

The statistics for each dimension and each measurement takes time to reach validity. The rush to production, to release, to iteration leaves us with much invalidity until we reach validity. The numbers your analytics kick out won’t clue you in. Kurtosis can give you a hint if it is not swamped. Slow down.

Once you have achieved normality with a measurement, how many sigmas do you have: 1, 3, 6, >6, 60? At three, your underlying geometry changes from Euclidean to spherical. Your business will change when your sigma is greater than six. You will have more competition and the number of fast followers will explode.

Adding data points will change the normal, which in turn changes the outliers. This will be even more the case when you attend to the changes to your dimensions and measures, and your TALC phases and pragmatism slices. The carried and carrier will have their own dimensions and measures. They will also have different priorities, and levels of effort. When moving from a carried layer to a carrier layer, the outliers would be different, because the carrier and carrier have their own normal distributions each with their own dimensions and measures.  The emphasis changes, so the statistics change. The populations across the stack differ widely.

So much mess can be made with metrics. Gaps in the data happen. The past hangs around to assert itself in the future. When you drive down a road, adjacent houses can be from different decades. Data is likewise. The infrastructure helps eliminate gaps and the miss-allocation of data. It’s not as simple as a measure to manage, you have to manage to measure.



Burst-and-Coast Swiming

January 13, 2018

Twitter brought this article to my attention:

 “In contrast with previous experimental works, we find that both attraction and alignment behaviors control the reaction of fish to a neighbor. We then exploit these results to build a model of spontaneous burst-and-coast swimming and interactions of fish, with all parameters being estimated or directly measured from experiments.”


“We disentangle, quantify, and model the interactions involved in the control and coordination of burst-and-coast swimming in the fish Hemigrammus rhodostomus. We find that the interactions of fish with the arena-wall result in avoidance behavior and those with a neighbor result in a combination of attraction and alignment behaviors whose effects depend on the distance and the relative position and orientation to that neighbor. Then we show that a model entirely based on experimental data confirms that the combination of these individual-level interactions quantitatively reproduces the dynamics of swimming of a single fish, the coordinated motion in groups of two fish, and the consequences of interactions on their spatial distribution.”

“Disentangling and modeling interactions in fish with burst-and-coast swimming reveal distinct alignment and attraction behaviors,” Daniel S. Calovi,

So what does this have to do with product management? It boils down to the technology adoption lifecycle (TALC). It basically describes organizing behavior, the organizing behavior of clients, customers, users, and markets.

Burst-and-coast swimming happens in the buy. An initial sale is a big effort on the part of the buyer and seller. Back when we sold software, the initial sale generated a large commission for the sales rep, and the subsequent upgrade sales generated smaller commissions–burst-and-coast. Then, came the install, burst-and-coast.  Once we get our own software installed, we use it and hope with never have to deal with the guts of the application ever again. Well, that tells you, I’m not a hacker or a developer. If the effort is too high, I bail. Sorry, my bad.

Attraction and alignment behaviors control the reaction of a vendor to neighboring vendors. And, the customers do likewise. Once you get that first B2B early adopter in the first pragmatism slice, the client, and start selling to the first degree of separation prospects in the adjacent pragmatism slice, you see this burst-and-coast behavior. In the market, the followers follow the leader, the peak predator. The vendor and the vendor value-chain members do the same. Even fast followers don’t get ahead of the leader. The leader pays a price to lead. They own the burst. The fast follower doesn’t have the capabilities it needs to be the leader. The pragmatism slices are the arena walls. Address only the next pragmatism slice, not the current one, or the past ones. The pragmatism slices are not random.

For each adjacent slice and subsequent adjacencies, the business model must convince, so even the arguments, the explanations burst-and-coast.   During that initial client engagement we build the client’s product visualization with our underlying technology, then in preparation for the chasm crossing, we build the first business case, and we have to help ensure the client achieves that business case. The chasm crossing is one of those arena walls sitting between two pragmatism slices that is problematic enough to warrant elevation to a TALC phase. We have plenty of time to ensure the business case. We are constantly addressing that business case.  “Software By Numbers,” describes how the client engagement should proceed. With each release, we have to convince the funding early-adopter client to fund the next release. Each release has to make the case for the next release. We cannot deliver the whole thing all at once. With Agile, we make the case at the level of each feature making its own case. So we have a school of code that heads off in a single direction effortlessly or a school of developers or a school of vendors and value chains. But, the convincing case for the client is not persuasion. It’s an obvious pathway in the functionality that takes the client to their value proposition by enhancing their competitive position, their place in the larger school of fish.

In the B2B early adopter phase, we are focused on the carried content. We have to build the carrier functionality at the same time, but that is in the background. The first business case is specific to the client and their role in the industrial classification tree, their specific subtree, no higher, lower, or wider. Care must be taken to keep the subtree small in the beginning. Stage-gate on the subtree. How much is enough? The big picture is hyperbolically away, and looking deceptively, and unsustainably small. You’re ten years away, this is no overnight unicorn. But, back to the fish, when you deliver to two channels: the carrier, and the carried, you have two schools of fish. They don’t swim together. There are arena walls.

In the next phase, the vertical market, there are more fish and more pragmatism slices. Sales will be random. Sales will ignore those arena walls. They are chasing money, not product evolution, subtree focus, or the future. Yes, we must pay for today and tomorrow, but outliers are costly and not an opportunity to swamp all the other considerations. Having one big customer is a bad, but an attractive looking proposition. The vertical itself is an opportunity. Plenty of companies started, did business, and exited within this phase never moving to the next phase. Companies lived comfortably in the vertical. It’s their school of fish. They go together.

Preparing for the next phase, the horizontal, or the IT horizontal requires us to shift from carrier to carried. Preparation should have started years or months ago. Worse, this phase will be about aggregating all the companies written on top of the same underlying technology. Vertical products will be rewritten as templates on a single carrier. In the vertical, the carrier should have stayed unified and similar. It’s a different architecture. The school of fish the customers will be much larger and wider, but the customer is now the IT department. The previous customer was the non-IT, business unit executive.

Yes, this is not what we do today. These days everything is continuous, small, short-term financed, exiting soon, not changing the world stuff. But, it is also, not what they did yesterday either. I’m writing from the full-TALC course, not entering late-market only, aka starting in the middle, or near the end. The problems we face today from globalism won’t be solved with the innovations we do these days, the continuous stuff; the science and engineering-free, management only innovation. So start something that will last 50 plus years that starts at the beginning and exploits every phase of the TALC until it ends up in the cloud.

Discontinuous innovations give rise to entirely new value chains, new careers, unimagined futures, and unaddressed sociological problems that we have not addressed during the youth of the software age. We are older now. We are more orthodox now. Yes, that business orthodoxy is a school of fish. We used to swim outside those fish, but they have joined us because our venture funding still works, while there, the banks, don’t make loans like they used to. The school of fish that are banks has moved on, so the orthodoxy sees the innovators as prey, and we apparently agree. We have not pushed back and said, hey this donut shop is not an innovation. But, we are being taught a buzzworded definition of innovation.

Anyway, you grasp what I’m calling the burst and coast, the never before though and the commodity, the innovator and the orthodoxy. Many fish of many species all self-organizing and structured in difficult ways to see. We will meet many along the way.

And, one last thing, read widely. The last thing that will teach us anything these days is the innovation press. Always ask yourself what can this teach me regardless of what you’re focused on these days.



The Dance of a Normal: Data Quantity and Dimensionality

January 10, 2018

Last night I read a post on John Cook’s blog, “Formal methods let you explore the corners.” In it, he mentioned how in a sphere with high dimensionality, most of the mass is in the corners. He put a circle in a box to illustrate his point.

Last week there was a figure illustrating this. As the sphere gained additional dimensions, it becomes more cube-like.  Given a normal distribution looks like a circle when viewed from the top down, I drew what a high dimension normal would look like as it moved from n=1, to n>36, and dimensionality moved from 0 to high dimensionality. Of course, I threw in a few other concepts, so the figure moved from this goal to more. That’s drawing to think.

Increases in data points and dimensionality

Low and High Dimension Normals

The figure on the left provides a top-down view of the normal, as a circle, as n increases, and dimensionality increases.

At n=1, the distribution starts off with a Dirac function exploding off to infinity. The height of the line representing the Dirac function contains the entire probability mass of the distribution. Given the usual z-score representation, a line by itself can’t be a probability because we need an interval. The second data point will show up quick enough to put an end to that quandary. Exhale. The point representing the line of the Dirac function is in the center of the concentric circles under the wider purple line.

A few more data points arrive. At n=5, we have a cluster around the black point in the center of the concentric circles. Here some of the probability mass has flowed down to the x-axis and outward into the distribution. That distribution is not normal yet. These data points would present a Poisson distribution or a set of histogram bars on a line. Here that line would be a curve. But, the data points would be curved. This cluster is shown with darker gray.

At n=20, the Poisson distribution would tend to the normal. This normal is comprised of a core and tails. These are concentric circles centered at the mean, the black point in the center, the point representing the line of the former Dirac function. The core is shown as a lighter gray circle; the tail, the lightest gray. As the number of data points increases, the width of the core and the tails grow.

As the number of data points grows, the normal distribution loses height and the probability mass that comprised that height moves outward in the core and the tails. Black arrows to the right of the mean show this outward movement. The circles representing the core and the tails get wider. Once, there are 36 data points, the width, and height of the normal stabilize. As more data points are added, Not much will change.

All of these changes in width and height were relative to a low number of dimensions. When you have less than 36 data points, the distribution would be skewed. This is ignored in the above figure. But, each dimension would be skewed initially and become normal as data points for that dimension are added. This figure is drawn from the perspective of a normal with more than 36 data points, hence no skew. Skew would appear in a top-down view as an ellipse.

Consider each dimension as having its own normal. Those normals are added together as we go. I do not know where the threshold between low dimension and high dimension normals would be. The high dimension normal footprint is shown as a rounded off square, or a squared off circle. It is pink. The corners get sharper as the number of dimensions increase. A black, double-sided arrow indicates the boundary between low dimension and high dimension footprints.

I used a light blue circle to demonstrate how the density in the high dimensional normal is not even. When a tail ends up in the corner, it is longer and the circle tangent to the normal curve is bigger. When a tail ends up on the side, it is shorter and the circle tangent to the normal curve is smaller. These black circles and ellipses represent intrinsic curvatures, or kurtosis, each given by the inverse of their radius, of the tails.

The normal we are used to viewing is a two-dimensional slice through the mean, so we have two tails. In a three-dimensional normal, we can rotate the slicing plane through the mean and get another two tails. With the standard normal, all the slices would look the same. The tails would be the same. The circles representing the intrinsic curvatures would be the same. But, when the normal is skewed, the slices would differ.  The tails would differ with one side being longer than the other. The circles representing the intrinsic curvatures would differ as well. The shorter tail would give us a smaller, tighter circle. The longer tail would give us a larger, looser circle.

If we rotated our slicing around the normal through the mean, in the high dimensionality situation, we would see the tails being the same on both sides, but each slice would have tails of different length. In the low dimensionality situation, the tails would be the same all in all slices.

The intrinsic curvatures are shown in black on the left side of the normal. I’ve put red spheres in inside those curvatures to hint at the topological object, the aggregate of those spheres, shown with the thick red lines, laying on top of the normal.

The pink footprint meets the light blue circle at the rounded off corners of the footprint of the high dimensional normal but diverges at the sides. There is no probability mass at the sides as it flowed into or was pushed into the distribution envelope suggesting higher densities inside the distribution along the sides. The light blue arrows indicate this.

The corners have the longest tails. The sides have the shortest tails. Given that the slices made by the planes slicing the distribution through the mean are symmetric, the tails are the same on both sides of the mean.

 Black Swans and Flow of Probability Density

In the center figure, I showed the usual side view of the normal. I drew two pink lines to show where the high dimensional footprint ended. The high dimensional footprint has less width except at the corners, so those pink lines, so rotating the high dimensional normal relative to the low dimensional normal would move those pink lines. This reflects a risk mechanism similar to skew risk and kurtosis risk.

I projected those pink lines. Superimposing the low and high dimensional normals presents us with two black swans if we go with the x-axis, or two shorter tails if we go with the x’-axis.  The two black swans appear as cliffs, the horizontal lines between the x and the x’ axes. The length of those lines represents the thickness of the bit loss. The tail volumes were lost between the pink lines and the outer gray lines labeled Prior Tail. The blue rectangles beside the distribution indicate where the tail volumes were lost. Where tail volumes were lost, we renormalize the distribution. In a high dimensional normal, these volumes would be small. These volumes contain probability masses in low dimensional normals. In high dimensional normals, all the probability densities are on the surface of the distribution.

In the center figure, I’ve used thin, light blue arrow to clarify the flow of probability density from the Dirac function into the normal.

Intrinsic Curvature

The figure on the right illustrates with a side view of the normal: the effects of skew, and the presence of a torus or, more accurately, a ring cyclied. I first discussed this ring cyclied in The Curvature Donut.

The purple line in the figure on the left represents the x-axis of the horizontal view of the normal. On the figure on the right, the purple line is the x-axis. I used a standard normal but added the circles representing the intrinsic curvatures in red. Since the standard normal is symmetric, both of the outer intrinsic curvatures are the same size. This symmetric situation gives us a torus topologically. The torus sits flatly on top of the tails of the normal. This is the high dimension and the high number of data point cases. Then, I hinted at a skewed distribution, aka the low number of data point cases, with the angled line of the median and a short tail. That short tail would have a smaller circle representing its intrinsic curvature. This gives us a ring cyclied topologically. The ring cyclied sits tilted on the tail of the normal.

I then superimposed the smaller circle and larger one from the skewed situation. The smaller circle one represents maximal curvature; the larger circle, the minimal curvature.  Then, with the black circle, I averaged the two. So I could get down to one kurtosis value. Kurtosis is one number. You might tell me that it represents the kurtosis of the standard normal, but skew is tied to kurtosis, so there should be two numbers since their number would not be equal, but this average business is just my guess. I still don’t see the height of the distribution a being indicated by kurtosis. Still wondering.

Kurtosis as peakedness is stated all over the literature as the ground truth, but a few authors and I say that that doesn’t make sense. The third-moment, the calculus definition of kurtosis has nothing to do with peakedness.

The x and x’ axes here is accidental. The black swans show up as well. Again accidental.

I left a comment about multidimensional normals in John Cook’s blog. He replied while I was writing this. I will have to think about it a while, and I may have to revise this. See his Willie Sutton and multivariate normal distribution.

As always enjoy.