Yes or No in the Core and Tails

Right now, I’m looking at how many data points it takes before the dataset achieves normality.  I’m using John Cooks binary outcome sample size calculator and correlating those results with z-scores. The width of the interval issue matters. The smaller the interval, the larger the sample needed to resolve a single decision.  But, once you make the interval wide enough to reduce the number of samples needed, the decision tree is wider as well. The ambiguities seem to be a constant.

A single bit decision requires a standard normal distribution with interval centered at some z-score. For the core, I centered at the mean of 0 and began with an interval between a=-0.0001 and b=+0.0001. That gives you a probability of 0.0001. It requires a sample size of 1×108, or 100,000,000. So Agile that. How many customers did you talk to? Do you have that many customers? Can you even do a hypothesis test with statistical significance on something so small? No. This is the reality of the meaninglessness of the core of a standard normal distribution.

Exploring the core, I generated the data that I plotted in the following diagram.

With intervals across the mean of zero, the sample size is asymptotic to the mean. The smallest interval required the largest sample size. As the interval gets bigger, the sample size decreases. Bits refers to the bits needed to encode the width of the interval. The sample size can also be interpreted as a binary decision tree. That is graphed as a logarithm, the Log of Binary Decisions. This grows as the sample size decreases. The more samples required to make a single binary decision is vast while the number of samples required to make a decision about subtrees requires fewer samples. You can download the Decision Widths and Sample Sizes spreadsheet.

I used this normal distribution calculator to generate the interval data. It has a nice feature that graphs the width of the intervals, which I used as the basis of the dark gray stack of widths.

In the core, we have 2048 binary decisions that we can make with a sample size of 31. We only have probability density for 1800. 248 of those 2048 decisions are empty. Put a different way, we use 211 bits or binary digits, bbbbbbbbbbb but we have don’t cares at  27, 26, 25, 24, and 23. This gives us bbbb*****bb. Where each b can be a 0 or 1. The value of the don’t cares would 0b or 1b, but their meaning would be indeterminate. The don’t cares let us optimize, but beyond that, they happen because we have a data structure, the standard normal distribution representing missing, but irrelevant data. That missing but irrelevant data still contributes to achieving a normal distribution.

My first hypothesis was that the tail would be more meaningful than the core. This did not turn out to be the case. It might be that I’m not far enough out on the tail.

Out on the tail, a single bit decision on the same interval centered at x=0.4773 requires a sample size of 36×106, or 36,000,000. The peak of the sample size is lower in the tail.  Statistical significance can be had at 144 samples.

When I graphed the log of the sample sizes for the tail and the core, they were similar, and not particularly different as I had expected.

I went back to my the core and drew a binary tree for sample size, 211 and the number of binary decisions required. The black base and initial branches of the tree reflect the being definite values, while the gray branches reflect the indefinite values or don’t cares. The dark orange components demonstrate how a complete tree requires more space than the normal. The light orange components are don’t cares of the excess space variety. While I segregated the samples from the excess space, they would be mixed in an unbiased distribution.

The distribution as shown would be a uniform distribution, the data in a normal would occur with different frequencies. They would appear as leaves extending below what is now the base. Those leaves would be moved from the base leaving holes. Those holes would be filled with orange leaves.

Given the  27, 26, 25, 24, and 23, there is quite a bit of ambiguity as to how one would get from  28 branches to 22 branches of the tree. Machine learning will find them. 80’s artificial intelligence would have had problems spanning that space, that ambiguity.

So what does it mean to a product manager? First, avoid the single bit decisions because they will take too long to validate. Second, in a standard normal the data is evenly distributed, so if some number of samples occupies less than the space provided by 2x bits, they wouldn’t all be in the tail. Third, you cannot sample your way out of ambiguity. Forth, we’ve taken a frequentist approach here, you probably need to use a Bayesian approach. The Bayesian approach let you incorporate your prior knowledge into the calculations.

Enjoy.