Archive for May, 2019


May 13, 2019

When statistics was invented, it was based on some simple math, the mathematics of distance. The world was Euclidean. Truth was binary. Inference was based on normal distributions, areas under the curves, and distances. Those normals were symmetric. There were no long tails, and no short tails. Pi was a constant.

Now, we have Poisson distributions, small data, and big data. We have hyperbolic spaces, Euclidean spaces, and spherical spaces among many spaces. We have linear spaces and non-linear spaces. We have continuous spaces and finite spaces. Truth is no longer binary. Inference is still based on normal distributions. Those normals become symmetric. Skewness and kurtosis give us long tails and short tails. Pi is variable. And, the total probability mass is tied to pi, so it is also variable running from less than one to more than one.

The number of data points, n, drive our distributions differentially. “We are departing our estimated normal. But, we will be travelling through a skewed normal for a while.” You have to wonder if that probability mass is a gas or a solid. Is the probability mass laid out in layers as the modes and their associated tails move?

It’s too much, but the snapshot view of statistics lets us ignore much, and assume much.

This figure started out as a figure showing what a normal distribution in the Lp geometry looked like when p = 0.5. This is shown in blue. This is a normal in hyperbolic space. The usual normal that we are familiar with happens in L2 space or Lp space where p =2. This is the gray circle that touches the box that surrounds the distribution. That circle is a unit circle of radius 1.

The aqua blue line in the first figure shows the curve of say p=0.1. The figure immediately above shows what happens as p increases, the line approaches and exceeds p=2. At p=1, the line would be straight, and we would have a taxicab geometry. The value of p can exceed p=2. When it does so, the distribution has entered spherical space. The total probability density equals 1 at p=2. It is more than 1 when p<2. It is less than 1 when p>2.

The process starts with that Dirac function where the line goes to infinity. Then, the probability mass flows down into an eventual normal. That eventual normal travers across the Lp geometries. The geometry is hyperbolic until the Lp geometry reaches L2, where p=2. The total probability mass is more than one. The L2 geometry is the standard normal. In the L2 geometry, the total probability mass is one. Then the Lp geometry exceeds p=2. This is the spherical geometry where the probability mass migrates to the shell of the sphere leaving the core empty. At this point the total probability mass is less than one.

Notice that the early adopter phase of the technology adoption lifecycle happens when the total probability mass is more than one. And, the late mainstreet and later phases happen when the total probability is less than one. These changes in geometry mess with our understanding of our financial projections. That early adopter phase is relative to discontinuous innovations, not continuous innovations as the latter happen in the mainstreet or later phases. That early adopter phase is commonly characterized as being risky, but this happens because hyperbolic spaces suppress projections of future dollars, and the problems of investing in skewed distribution where the long tails contract while the short tails remain anchored. The probability mass being more than one with us assuming it is one has us understating the probabilities of success. Our assumptions have us walking away from nice upsides.

All these changes happen as the number of data points, n, increases.

The distribution started when we asserted the existence of a stochastic variable. This action puts a Dirac function on the red plus sign that sits at the initial origin, (0,0) of the unit circle of the distribution. This value for the origin at this n=0, should appear in black, which is used here to encode the observable values of the parameter.

Watch the following animation. It shows how the footprint of the distribution changes as n increases. The distribution comes into existence and then traverses the geometries from the origin to the distant shell of the eventual sphere. This animation shows how the normal achieves the unit circle once it begins life from the origin, and traverses from hyperbolic space to Euclidean space.

In the very first figure, the light gray text lists our assumptions. The darker gray text is observations from the figure. The origin and the radius are such observables. The red text are implied values. We are assume a normal, so the mean and the standard deviation are implied from that. The black text are knowns given that the distribution is in hyperbolic space.

The color codes are a mess. It really comes down to assertions cascading into other assertions.

The thick red circle shows us where the sample of the means happens as n increases. We have a theoretical mean for the location of the origin that needs to be replaced by an actual mean. Likewise, we have a a theoretical standard deviation. That standard deviation controls the sized of the distribution, which will move until normality is achieved in each of the two underlying dimensions. Notice that we have not specified the dimensions. And, those dimensions are shown here as no having skew. We assumed the normal has already achieved normality.

OK. So what?

We here about p-hunting and the lack of the statistical significance parameters actually representing anything about the inferences being made these days. But, hyperbolic spaces are different in terms of inference. The inference parameter of α and β are not sufficient in hyperbolic space as illustrated in the following figures.

Overlapping based on the Assumed Normals
Overlapping of the Hyperbolic Tails

In the figures, I did not specify the α and β values. The red areas would be those specified by the α and β values so they would be smaller than the areas shown. I’ll assume that the appropriate value were used. But in the first diagram, there would be statistical significance where there is no data at all. In the second diagram, the statistical significance would again be based on the asserted normal, but results would still include some data from the hyperbolic tails but not much.

The orientation of the tails would matter in these inferences. That requires more than a snapshot view. The short tails of a given dimension orients the distribution before normality is achieved. Given the dependence of this orientation on the mode and given that a normal distribution has many modes over its life, orientation is a hard problem. Yes, asserting normality eliminates many difficulties, but it hids much as well.

As product managers, we assume much. Taking a differential view will help us make valid inferences. And, betting on the short tails, not the long tails will save us time and effort. We do most of our work these days in the late mainstreet or later phases. Statistics is actually on our side because the probabilities are higher than we know, and multiple pathways or geodesics that we can follow.


A Quick Skewness

May 4, 2019

When I get a tweet from one of John Cook and associates blog post, I dive in. In this latest dive, I take a look at his post on “Duffing equation for [a] nonlinear oscillator.” In this post, John starts with an equation that is linear at one value of a parameter and non-linear at another value. Good to know.

The link in that blog post led to another blog post, “Obesity index: Measuring the fatness of probability distribution tails,” which gave rise to Rick Wicklen’s comment that mentions quartile-based skewness. Skewness is usually based on moments. Rick’s comment links to another blog post, “A quantile definition for skewness.” that discusses quartile-based skewness.

The usual definition of skewness is the Pearson definition, which uses the third derivative as the skewness of the function that represents the distribution, typically a probability density function (pdf) or a cumulative density function (cdf). The zeroth derivative of the distribution is the total probability. The first derivative is the mean. The second derivative is the variance. The third derivative is skewness. The fourth is the kurtosis. There are two more moments following those. And, there can be more moments beyond those. Or, just say there are moments all the way down, down to zero, if we are that lucky.

The quartile definition of skewness bypasses all that calculus. The post on the quartile definition of skewness is known as Bowley or Galton skewness.

Keep in mind, throughout this post, that we are talking about a skewed normal. The skewed normal is asymmetric. The individual quartiles are not unit measures. The range is divided by four, but the divisions for the skewed normal are not uniform. So I drew the quartiles as having a random median (Q2). Then, that leaves us with Q0, Q1 and Q3, Q4 as being arbitrary. Once the distribution achieves normality, the quartiles would have the same widths, the distribution would be symmetrical, the skew would be zero and, the mean, median, and mode would converge to the same value.


You may have seen these quartiles expressed as a box-whisker plot.

box and whisker plot

These box-whisker plots show up in daily stock price graphs. That hints towards investment considerations. We won’t use that language here.

What we need is the quartile correlations in box-whisker plots so we can move on to the calculation. And, so we can develop some intuitions about skewness and skewed distributions.

horizontal box and whisker plot - labelled

A more stock market view of a box-whisker plot follows.

vertical box and whisker plot - labelled

Notice that skewed distributions are described by their medians. A normal, assuming a skewness of 0, is symmetric. Skewed distributions have yet to achieve normality. Once normality is achieved, the mean, the median, and the mode converge to the same value. In the yet to achieve normality, skewed normal, these forms of the average are distinct. The mode and the mean move away from the median and sandwich the median between them. In my posts here, I’ve described the skewed normal as having a median that rests on the mode and lays at some angle based at the mean.

box and whisker plot with skewed distribution

Here I’ve shown the association between the box-whiskers plot and the underlying skewed distribution. I’ve shortened the whiskers portion of the box-whiskers plot so it lines up with the range of the distribution. The tails are at Q0 and Q4. In this figure, the short tail is at Q0. In skewed distributions the long tail will be on the opposite side of the median. 

On any dimensional axis, the two dimensional projection along the x-axis, the short tail is anchored and the long tail contracts towards the short tail as the distribution achieves normality as the number of data items, n, increases. Once n is greater than one, the projection down to x and y axes, compresses the tails, so the tail on opposite side of the median is longer associated.

Here we see what negative and positive skewed distributions look like. The distribution we associated with our box-whisker plot was positively skewed. Notice the gray dashed lines inside each distribution. They show us what the normals would look once normality is achieved. Again, the short tail is fixed or anchored. The short tails do not move. The long tails contract towards the short tail

The figures don’t show us how normalization would change the shape of the normals. They would be taller. The volumes under the curves would not change as the shape of the distributions change.

But here is the thing, we are investing over time. Over time, the distribution would become symmetrical. Money spent near the short tail would be conserved while money spent out near the long tail would be lost. Functionality serving customers in the long tail would be stranded. Given that we deal with statistics on the basis of snapshot pictures, and that we assume normality, we wouldn’t see why we lost the money and time we spent out on the long tail. We might not realize that our operational hypotheses are no longer valid.

So back to Bawley skewness, one of the quick, no calculus involved ways to calculate skewness.

Skewness = ((Q3-Q2)-(Q2-Q1))/(Q3-Q1)

So at the end of this post, we stepped out for a short walk. We walked around a familiar neighborhood and found some interesting things, and we found some confirmations for a few intuitions. Anything can take you to serendipity and surprise. Enjoy.