Archive for February, 2018


February 25, 2018

A tweet sent me to “Mean, Median, and Skew: Correcting a Textbook Rule.” The textbook rules are about the mean being in the long tail and the mode being in the short tail. The author discussed exceptions to this rule. Figure three presented me with a distribution that the author claims to be a distribution that was an exception to the textbook rules. The author claims the distribution is a binomial. I annotated the figure. It’s definitely some kind of a nomial, but looking closer, it is not a binomial.

00 Original Alleged Binomial

The nominal on the right side of the distribution shows us what we see if we look at the side of any normal. An aggregate curve comprised of a concave downward curve and a concave upward curve with an inflection point between them, a single inflection point between them.

The distribution on the left side is not the result of a single nominal. There are many inflection points. The left side of the distribution is concave down, concave up, concave down, and concave up. We can say the left tail is single tail comprised of two presented lines, or we can say they are the overlap of two different distributions. That second concave down hides a distribution inside the base distribution.

The distribution gets called a binomial because it has two prominent peaks. But the left peak is an aggregate of at least one more nomial. Otherwise, we would add another set of inflection points. When making an argument about where the mean, median, and mode are we have to consider each nomial to have its own triple. So there should be at least two triples, rather than one, as shown in the figure. I called the triple we were presented with an error, but it does present us with one of the exceptions the author wants to talk about. From this, we can take away the idea that these aggregate statistics hide more than they inform. I found myself in a Quora discussion on separating the underlying distributions of a binomial. There is math for that, math I do not know yet.

I am working on the assumption that all the underlying distributions are normal, a base assumption that is routinely made in statistics.

The graph hides much as well so I drew what I expected the distributions under the given “binomial” would be. I just eyeballed it.

01 More Nomials

I used arrows that match the color of the curve to show the concavity. Extra probability mass shows up at the intersections where distributions meet. I’ve labeled the probability mass at the intersections as gaps. Given the underlying distributions are only approximations, I didn’t make the green distribution, distribution 1, fit perfectly, so the thin layer of the second gap from the beginning lays on top of the distribution without involving a distribution. I used three different distributions to account for the tail convergence on the right. This gave rise to a gap. I didn’t catch this when I drew the figure. As I write this, there is no gap there. The red distribution accounts for that probability mass.

I went with a skewed distribution, distribution 1, to account for the second concave down section of the curve on the left side of the “second” nominal. A normal wouldn’t bulge outward under the exterior nominal, the black normal. A skewed normal has a long tail and a short tail. The intrinsic curvature of any long tail is low, so it has a large radius. The intrinsic curvature of any short tail is high giving us a small radius. The mean of this distribution is to the left. The median pushes the mean and mode apart symmetrically about the median. The median for distribution 1 leans to the right.

I went with three peaks on the left side of the “binomial.” I did this because distributions 2 and 4 have different heights. I know of no rules that would drive this decision. They could easily be one distribution.

The rest of our “binomial,” actually as demonstrated, it is a multinomial instead. We’ve ended up with five distributions so we would have five different triples of mean, median, and mode. These triples were aggregated in the author’s numeric results. We can take it that when the mean, median, and mode are the same, we have a standard normal. The textbook rules about the tails and their relationships to the mean and mode still stand. Otherwise, we have numbers generated from an aggregate normal.

Don’t just accept the “binomial” allegation. If the numbers don’t make sense, they don’t make sense. When numbers don’t make sense, you’ve got more sense to make.

As a product manager, I don’t want to aggregate and drive that into a product that fits no one.

I went on to play with the “binomial” distribution some more.

I started with vertical slices for the Riemann integral. I also did this to give me a hint towards the factors involved in each slice. Due to my use of raster graphics, some slice lines are thick, because the intersections of the distributions are not points. Some intersections are lines. The point intersections give rise to vertical lines. The line intersections give rise to rectangles. Each vertical slice in those rectangles can differ. They are not uniform. Individual slices would still look like a solid rectangle.

02 Some Vertical Slices

The vertical lines tell us that at that moment in time, our organization if we worked at the underlying granularity, would represent some management adjustment to serve the underlying populations appropriately. This both the gray and light blue lines or rectangles.

The blue lines show us where the associated distribution converges with the horizontal axis. That horizontal axis would move relative to any upmarket or downmarket moves the organization was undertaking over a period of time. I labeled these as ordering changes. But, the gray lines are ordering changes as well. Orderings come up when computing binomial probabilities and in game theory.

The pink area shows the expanse of a single factor mixture. Part of that area shows the factor associated with the black distribution quickly slowing down. I labeled that part of the black curve “Fast.” And, it shows the factor’s deceleration showing. That labeled “Slow.” Otherwise, this slice is relatively stable. Note growth is not a positive notion here. In fact, the late phases of the technology adoption lifecycle, the orthodox management phase is post growth and in decline–constant decline. The only options are to focus, an upmarket move, or to drop the price and move downmarket. Neither guarantee growth in themselves.

From the mean of distribution 5, the purple distribution, All factors are in decline. But in the pink area, the factors are organized by a single constant factor curve.

In Upton’s “Aesthetics of Play”, the pink zone is a single play space. In his book, rules generate spaces and those spaces dictate process and policy. The technology adoption lifecycle(TALC) is based on this idea, but it is based on populations organized by that population’s pragmatism. The business facing that play space or population must eliminate its process and policy impedances to succeed. Addressed impedances constitute your organization’s design.

These spaces make those nascent moments when we don’t have a normal part of the difficulty with bringing another discontinuous innovation to market while sitting in the space where the category the company is in is dying. The pink space is that end-of-life space. Notice how different the pink space is to any slice on the left side of the aggregate distribution.

Upmarket and downmarket moves move the feet of the distributions, the points of convergence with the horizontal. The new space might have additional intersections of the nominal distributions. Where this is the case, the factors for the new slices would change. This would repartition the existing populations as well. Where the nominals are normal, the additional populations gained by the move would not change the nominals other than at the feet. In upmarket moves, keep them large enough to maintain normality, or expect exposure to kurtosis risk.

In our diagrams, the red distribution seems high, which implies that it needs more density. The number of data points needs to be increased. This also implies that there should be some skew, but it is not apparent. As a distribution gains probability mass, it becomes lower and wider.

When looking for inflection points, those points can be lines. The nominal on the right exhibit that behavior. I went looking for what that means mathematically. The inflection point is ambiguous. I crossed paths with symplectic geometry. They deal with the same problem. The nice thing businesswise about this ambiguity is that it grants you some time to switch from growth to decline or from fast to slow. The underlying processes of the business need to change at all inflection points. The deal here between a point and a line is that a point is a sudden change requiring proactivity, and a line requires less proactivity.

Then, I wanted to see the toruses involved. So I started with the normal distribution on the right side of the “binomial.” I used the original distribution, not the teased out distribution, so the distribution on the left only exposed its left side. fitting a circle to the curve on the left was less clear.

03 Curvature

Imagine if a tori pair was shown for each of the five distributions. Where a tori pair does not have the same radius in each constituent circle, there would be kurtosis, a pair of tails, and a median lean. The radii of the circles in that pair would change as the 2D slicings were rotated around the underlying distribution. The median lean results from the particular dimensions of the 2D slice. This generates some ambiguity in the peak, as the median for each slice would differ. By slicings, I mean taking slices around the circle giving us a collection of different slices. I do not mean rotating the same slice.

Where a tori pair had the same radius, the distribution has achieved normality. The kurtosis would be near zero, the median would no longer lean, and the mean, median, and mode would converge to the same value. The radii of the circles would not change as the 2D slicings were rotated.

Next, I took horizontal slices as in Lebesgue integrals.

04 Some Horizontal Slices

As discussed in regards to the vertical slicing, the gray lines indicate point intersections. The thicker gray lines indicate line intersections.

Where the vertical slice figure showed gaps, those gaps are comprised of a collection of Poisson distributions and a single collective normal. Poisson distributions come to approximate the normal when it has 20 or more data points. The normal is achieved without approximation when 36 data points have been collected. Breaking a normal into subsets can give rise to Poisson distributions. So there is risk involved with these considerations. I highlighted these with yellow rectangles around the labels.

The skewed distribution, the green distribution, has been highlighted with the same yellow as the Poisson distributions because having not yet achieved normality, much will change and those changes will be rapid as normality is achieved.

The red arrows show the direction in which I expect the distribution to change. The left arrow associated with the skewed distribution is only considering the movement of the foot, everything will change with the skewed distribution. The base “binomial” will most likely change and give rise to an apparent 3rd nominal on the exterior of the aggregate distribution. The down arrows associated with the peaks can be expected to lose height or amplitude as more data is collected.

The median of the skew would become orthogonal. The change in its theta is not indicated on the diagram.

The intersections of the distributions will change, so they are highlighted in yellow as well.

The factor analyses also change when looked at from a horizontal slice point of view. You can consider the factors across a horizontal slicing to differ from the factors across a vertical slicing. There would be a collection of cubes if both slices where made. Those cubes would be N-dimensional, but given our slicings would be 2D, it would get messy. cubing based on a factor analysis would be easier to operationalize in the sense of organizational design.

I labeled the slices. I had intended to provide a factor analysis for each slice. If I had the underlying data that would have been possible, but a graphical approach proved frustrating.

Next, I generated the probability of a portion of the AI slice under distribution 5, the purple distribution. A Lebesgue integral would achieve the same result.

05 Probability of a Portion of a Slice

The blue rectangle represents the probability mass under the purple distribution between the vertical constraints of the gray lines delineating that dimension of the slice AI.

The author went on to give several examples of other aggregate distributions. He used these distributions to explore how the mean, median, and mode violate our expectations. So the textbook rules are violated by aggregates of underlying distributions, multiple distributions. This is true of the “binomial” example. As a rule, only consider those statistics to be valid at the level of the constituent nomials, rather than the aggregate nominal. Aggregate nominals frustrate the expected orderings of the statistical tuples.

06 Mean Mode Median

I take it that the thick black line is the mode. On the left, we get the textbook ordering. Then, in the yellow rectangle to the right of 0.5, it changes to an exceptional ordering. At some point, it changes back to textbook ordering. And to the right of 0.75, the mean changes its tail association to being associated with the short tail. In the textbook ordering the mean is in the long tail. This is where using a single number for kurtosis does not make sense. It only made sense in the standard normal sense where the tails have identical values on both sides on the 2D slice involved.

The author went on to construct a distribution associated with the graph showing the tuple ordering exceptions. In a skewed normal, the median leans over to sit on top of the mode. This is the case in the aggregate distribution used here. The ordering is not exceptional, but the lean is not at the value of mode but along it. Where I annotated this as exceptional, the exception is the distance from the median to the mode. The ordering is not exceptional. It does, however, change the width of the separation between the median and the mode. The ordering is not symmetric around the median. The red lines are intended to show the median leaning on the mean so that the asymmetry relative to the mean, median, and mode is clear.

07 Exception Constructed Distribution

Then, I went on to explore the logic of the 2D slice. Here we are talking about the logic of the carried data, not the logic of the statistical carrier. The logic of the statistical carrier would be that of a normal distribution. With all the mathematical approximation formulas allowing us to convert from one distribution to another, we might ignore the logical constraints. I’m calling these distribution-to-distribution logical constraints the logic of the statistical carrier. The aggregation rules for a normal is an example of such carrier constraints. The carried logic is that of the collected data, rather than the collection and analysis of such data.

Logical consistency is tricky. Decades ago consistency was a true or false question. Was it consistent from the top to the bottom across every branch of the argument? These days that’s called absolute consistency. But now, we have relative consistency. It works from some absolute consistency to a branch of the argument that is consistent with itself and that base absolute consistency. Other branches would arise. Those branches would not demonstrate absolute consistency with other branches. This kind of consistency is relative consistency.

Statistically, the relative consistency would be a characteristic of each tail. Absolute consistency would be a characteristic of the core.

Relative consistency leaves us in a non-Euclidean space. That space typically would be hyperbolic involving manifolds, rather than functions. This calls into question the management practice of alignment and organizational structure.

08 Slice of Distribution and Logic

In this figure, the logic of the tails is highlighted in pink. The question marks indicate where one would define shoulders, outliers, and distant outliers.  What are your definitions of those boundaries? This is a 2D slice. Another 2D slice through the mean might require different decisions. Another slice would have a different set of curves. One of the slices would appear to be a standard normal with equal tails on both sides of its mean.

Relative consistency would start at the shoulder of a particular tail. Where you don’t differentiate the shoulders from the tails, a relative consistency starts with a particular tail. Each tail would have its own logic.

The last figure demonstrates the slices concept. The red line is closer to a standard distribution and its tails. The blue slice is definitely skewed. The thin blue line in the core is there to hint at the lean involved in that 2D slice. The red slice does not exhibit any lean. As more data of the dimension underlying the blue baseline is collected, the lean will disappear as will the asymmetry of the tails.

09 Slices Lean and Tails

As a manager, big data is great if you have large existing populations and large existing collections of relevant data. Continuous innovation thrives in this situation. But, do be cautious of Poisson scale subsets. And, be cautious of any distribution summed to the existing normals. That data might be Poisson. And, that distribution would be skewed and kurtotic bringing you their relevant risks. Discontinuous innovation is blank space inventions tied to an absence of any relevant populations. These innovations have tiny networks. Data collected from those networks will be small data, Poisson, pre-normal, and will move across the terrain. It will be a long time before it settles down, but at the same time, it is a long way from being a commodity, or something that orthodox management practice can handle. It is a long way from the spherical geometry of that orthodoxy. It is a long way from the Euclidean of LP2. It is hyperbolic. All that distance implies there is real economic wealth to be created, and there is plenty of time to capture it.

The data collection and relevant distributions will mature.

Snapshot statistics is not all that informative. What your distributions dynamically.





Box-Whisker Charts

February 12, 2018

Twitter presented me with this box-whisker chart about perceptions of probabilities. The probabilities run from the most certain to the least certain. All these probabilities could 00bbe summed into a single normal distribution. I tried to put the footprints of all the distributions into a single footprint. I don’t have the tools.

Most of the distributions are skewed, so they are ellipses.

Each of these distributions appears to be mutual exclusive.

I already knew boxplots. So I tried to grasp the shape of the 00fdistributions. I annotated the above figure as shown on the right. I hacked the notation. I’ll discuss it in detail later in this post.

There are normal (N) and skewed (SK) distributions. Each of these box charts has a pair of tails, but as I went along, I realized there are three pairs of tails. Each pair of tails consists of a long tail (L) and a short tail (S). Once I realized there were three tails, I used L1, L2, L3, S1, S2, and S3 to label the tails. After that, I found tail pairs that were missing a tail. I used 0 (zero) to annotate them. Later, I realized that they are really don’t cares. Their lengths are unknown.

The outliers are annotated with red “if”s. Including outliers or excluding them should be a matter of established policy. The costs of writing code for a transient population of outliers can be quite, and needlessly, expensive.

I read What a Boxplot Shape Reveals About a Statistical Data Set and found a surprise. Boxplots assume a monomial distribution. The article compares two distributions with the same boxplot. 00eThey use the histograms to illustrate this problem. I’ve added the red text and the data point counts.

The distributions shown do not have enough data points to use the Poisson distribution to estimate the normals. The distributions have not yet tended to the normal, so they are skewed. The box-whisker chart would tell us more about the skew.

As I wrote this post, I looked back at the article 00dthat contained the first graph in this post. The article contained two graphs of the actual distributions summarized in those first two box-whisker charts.

In this figure, I labeled the outliers, They appear as their own distribution. I’ve also labeled the nomials.

00g The same labels apply to this figure, the second figure illustrating the distributions.

The next thing to look at is the normals being added together to give us those multinomial and binomial distributions. I have edited the figure to the right. I used the tails that I could see to provide the missing tail, the tail under the adjacent normal. Once all the tails have been provided, there is left over probability mass that appears where the two normals intersect. I colored those blue and called them “mix” as this is where mixture effects occur.

Later in the upper part of the figure, I just used red Bezier curves to suggest normals. Initially, I understated the number of nominals involved. Then, I found more than one inflection point on a given tail. These bulge out at the side of the distribution. These bulges are caused by another normal inside or under the covering normal. These can oscillate in some situation. But, the peak of the normal under is never exposed so you wouldn’t call it a nominal.

The previous figure shows us what the statistical distributions associated with the technology adoption lifecycle (TALC) would look like. They would be a series of distributions. They would not be a single distribution that just grows. The previous figure as looks like the pragmatism slices that comprise the TALC. Each pragmatism slice would have its own distribution. These distributions would aggregate into the TALC phase distributions.

While I was researching this, I watched a video on calculating multinomial probabilities. I watched the subsequent videos on this topic. It struck me that given independent, mutually exclusive probabilities used in these calculations gives rise to a histogram, which in turn takes us back to the box-whisker chart and individual distributions. It also takes us back to the finite probabilities of the long tail of feature use. Once you have stable frequencies in your long tail, you would have a set of probabilities that add up to one. Changes to the UI would change the frequencies and subsequently change the probabilities.

The figure above does not give us any hints as to skew or kurtosis. The box-whisker chart can provide some information. In the earlier histograms, the one on the right shows that we have a binomial distribution. The data sets for those distributions have too few data points so those distributions would be skewed. The peaks are medians. Those medians lean. They are not perpendicular to the x-axis. The lean pushes the mean and mode apart. With a few statistics beyond what the box-whisker chart is telling you, you will be able to determine how many nomials are involved.

Analysing a Box-Whisker Chart


We start with a box-whisker chart for a normal distribution.



Then, we examine the symmetries with two tests: the core test (A) and the tail test (B). Before confirming normality via the core test, the red line, the median, would be black.

To do the core test, we draw 45-degree lines from the cross both boxes from the shared location where the median intersects a rectangle containing the two squares as shown If the lines intersect the opposite corners the boxes are squares. This implies that the boxes are the same size and that the distribution represented by the box-whisker chart is symmetric in terms of the core of the distribution. If the diagonal lines intersect the sides at the same height, again, the distribution is symmetric.

Next, we do the tail test. We measure both tails to determine if they are equal, or shorter or longer lengths otherwise.

If the box-whisker chart passes both tests the distributed represented by the chart is symmetric, which in turn tells us that the distribution is normal (N). I annotate normal distributions with using a red capital N. I also show the median as a red line at an angle of 90-degrees. The median does not lean in symmetric normals.

I used tick marks to indicate that the whiskers are the same length as is done in geometry.

In this figure, we examine a box-whisker chart for a 02skewed normal. The boxes are not the same size. Doing the core test, we find that line for the left box intersects the box higher than the line for the right box. This demonstrates that the boxes are skewed. This was labeled with a red “SK.” Since we know the distribution is skewed, we can lean the median by taking the median’s cosign. This gives us the length of the median. As we angle this median, it will contact either the mode or the mean depending on which tail is long. Here we left the whiskers the same length. We labeled the long side (L) and the short side (S). I then drew the shape of the distribution in blue based on the information from the box-whisker chart alone.

Theta is the angle with which the median was leaned.

The mode and the mean are the same as the median in an unskewed normal. They separate symmetrically around the median in a skewed normal. They are shown, for this illustration only, as short, vertical black lines inside the box.

In box-whisker charts, the median is usually shown as a thicker line.

In this figure, we look for 03three pairs of symmetries. The distribution is normal, so the core and tail pairs in each pair are the same length. This will not necessarily be the case with skewed distribution.

I did not measure the outlier distances earlier. This is where that happens. if , where d is the distance function or metric, d(ab) = d(ac), d(ad) = d(ae), and d(af) = d(ag) then the distribution pairs are normal. Otherwise, the unequal pairs are skewed so they would have unequal core widths or tails.

Once we know what core widths or tails 04are long and which are short, we label them. Here the left core is narrower than the right core. All the other lengths are the same, but the asymmetry of the core makes all the tails on the left shorter in aggregate than those on the right. The summary notation of S and L were enough to convey all the relationships between the pairs of tails. The numbered notation gets more complicated later. Nothing guarantees a nice orderly set of relationships. Folding at the median will be informative in some cases.

In this figure, we get a messy ordering 05of the relationships. I’ve added some notation. Where you move from a short to long or long to short, the tails protrude. If everything on one side is short and everything on the other side is long protrusions are less likely. They are not impossible because of the relative nature of shorts and longs.

Swaps are fairly active things, so they constitute a sensitivity driving kurtosis risk.

I connected these swaps on one tail. The other tail is swapped as well. Here SW23 means there was a swap in the second pair of tails, and another swap in the third pair of tails. The cores are just the first pair of tails. SW23 just condenses SWAND SW3.

The next figure is a mess. Three 06measurements, members of each tail pair are missing. S1S2, and Sare missing. The thick line on the rectangle is the perpendicular median. The only whisker is to the right, so that is where the long tail goes, to the right. That means the median leans left. There are no outliers to the right, so S3 does not exist. I use zero to indicate non-existence.

Every outlier has been labeled with a red “if.” Every outlier causes us to consider whether to leave it in or take it out. The further away from the mean it is the more likely it will be eliminated. But standing business rules are better than ad-libbing here. Establish policies. Outliers are costly to serve.

In this figure, I have annotated the curvature 07of one of the pairs of tails. Given three pair of tails, there would be three toruses that could be generated by revolving the curvatures around the mean. A curvature is the reciprocal of a radius. This implies that high curvatures are tighter and smaller than low curvatures. The small orange circle has a tight curvature. The large orange circle has a looser curvature. A 2-D slice is shown. In n-D or 3-D, the two circles are part of the same torus revolving around the core of the normal. The surface would be smooth and continuous.

A tight curvature corresponds to a short tail as it is tangent to that tail. A loose curvature corresponds to a long tail as it is tangent to that long tail. As the distribution approaches normality, the curvatures equalize to some average curvature. The circles become the same size on both sides of the distribution in its 2-D view or slice. The curvatures of the standard normal are the same on both sides of the distribution.

Big data ultimately comes down to Markov chains that sequence individual distributions together. The original charts demonstrate how meaning is particular to place. Upton says as much in his The Aesthetic of Play, as did Moore’s Crossing of the Chasm.