So the whole mess that I mentioned in Yes or No in the Core and Tails II, kept bothering me. Then I realized that the order of the decisions didn’t matter. I can move the don’t cares to the bottom of my tree. It took a while to revise the tree. In the meantime, I read Part 2 of the Visual Introduction to Machine Learning, which led me to believe that moving the don’t cares was the correct thing to do.

The figure too small to see. But it is a complete binary tree of size 2^{11}, which takes us to 2048 bits, or a sample size of n=2048. Notice that we achieve normality at n=1800. This situation should present us with a skewed normal, but somehow the distribution is not skewed according to John Cooks binary outcome sample size calculator. Of course, I’m taking his normality to mean standard normal. Those five layers of don’t cares give us some probability of 1/32, or p = 0.03125 at each branch at 2^{6}. Or, taking using the number from higher density portion of the tree, 1800/2048 = 0.8789, or the number from the from the lower density portion of the tree, 248/2048 = 0.1210. No, I’m not going to calculate the kurtosis. I’ll take John’s normal to be a standard normal.

The neural net lesson taught a nice lesson summed up by the figure about bias and variance. Yes, we are not doing machine learning, but another term for the same thing is statistical learning. We have the same problems with the statistical models we build for ourselves. We have bias and variance in our data depending on how we define our model, aka what correlations we use to define our model.

Model complexity is indirectly related to bias. And, model complexity is directly related to variance. Part 2 of the Visual Introduction to Machine Learning explains this in more depth if you haven’t read it yet.

Watch the zedstatistics series on correlation. It will take some time to see how his models changed their definitions over the modeling effort. He is seeking that minimum error optimization shown in the figure. Much of it involves math, rather than data.

Given that we have pushed our don’t cares down below our cares, we set ourselves up in a sort of Cauchy distribution. Cauchy distributions have thicker tails than normals as shown in the normal on the right. In some sense, the tail thickness is set by moving the

x-axis of the normal down. Here we did that by some epsilon. In a marketing sense, that would be an upmarket move without renormalization. But, in our “don’t care” sense the don’t cares are defining the thickness of that epsilon.

With normal distribution shown on the right, we are defining our known as what we got from our sample, our soft of known as the space of the don’t cares, and our unknowns as the yet to be surveyed populations. The soft of knowns represent our tradeoffs. We had to choose a path through the subtree, so we had to ignore other paths through the subtree. There were 32 paths or 2^{5} paths of the 2^{11} paths. Keep in mind that the don’t cares don’t mean we don’t care. Don’t cares allow us to solve a problem with a more general approach, which we usually take to minimize costs. But, in the marketing sense, it’s more that we didn’t ask yet. Once we ask and get a firm determination, we firm up one path from the 32 possible paths. We can use don’t cares to move forward before we have a definitive answer.

But, the bias and variance figure tells us something else. It tells us where in the machine learning sense the ideal solution happens to be. It is at the minimum of a parabola. In the frequentist sense, that minimum defines a specific standard deviation, or in the approach to the normal sense, that minimum tells us where our sample has become normal. It also tells us where we have become insensitive to outliers.

Once we have found the minimum, we have to realize that minimum in the development or definitional effort. Agilists would stop when they reach that minimum. Would they realize that they reached it? That is another matter. Ask if they achieved normality or not. But, the goal of machine learning is to approximate a solution with limited data, or approximating the parabola with a limited number of points on the parabola. Once you’ve approximated the parabola, finding the minimum is a mathematical exercise.

We can represent the product as a line through that minimum. That line would represent the base of a decision tree. I’ve represented these decision trees as triangles. Those triangles being idealizations. A generative effort in a constraint space is much messier than a triangle would suggest.

I’ve annotated the bias and variance graph with such a line. I’ve used a straight line to represent the realization. Every realization has an ontology representing the conceptualization to be realized. Every realization also has a taxonomy, but only after the realization. It boils down to ontologies before and taxonomies after. In the figure, the line from the minimum error to the baseline of the bias and variance graph is the target of the development effort. The realization line was projected and redrawn to the right. Then, the ontology and the taxonomy were added. Here the ontology and the taxonomy are identical. That is far from reality. The ontology and the taxonomy are symmetrical here, again far from reality.

The figure below the one on the right shows a messier view of a realization to be achieved over muliple releases. The solid red line has been released. There is an overall taxonomy, the enterprise taxonomy. And, there is the taxonomy of the user. The user’s effort generates some value that is significant enough to warrant continued development of the intended realization shown as red striped line. The user’s taxonomy is limited to the user’s knowledge of the carried content. The user’s knowledge might need to be enhanced with some training on the underlying concept. The user may not know the underlying conceptual model defined in the ontology. The developers might not know the underlying conceptual model either.

We cannot feed an ontology to a neural network. And, that neural network won’t discover that ontology. When Google wrote that Go playing application, it discovered a way to play Go, that no humans would have discovered. There are more ways to get to a realization than through ontologies and taxonomies.

The value of a realization is achieved by projecting effort through the realization. That value is evaluated relative to a point of value. That value is evaluated by some valuation baseline. Different managers in an enterprise would have different views of the realization, and different valuation baselines.

The symmetries, asymmetries, and axes of those symmetries that I highlighted are significant indicators of what must be learned, and who must learn what is being taught. Value realization is tied to what must be taught. The need to teach like the need to design interfaces are signals that underlying ontology was not known to the users, and not known and subsequently learned by the developers. The need to teach and design shows up more in products designed for sale or external use.

So what is a product manager to do? Realize that the number of samples is much larger than what Cook’s formula tells us the minimum number of samples would be. Don’t cares are useful minimizations. There is one ontology and many taxonomies. Agile assumes that the ontology will be discovered by the developer. When the UI is not straightforward, the ontology has been departed from. And, there are many views of value and many valuation baselines.

Enjoy.

July 16, 2018 at 5:01 am |

[…] Beyond the orthodoxy. « Yes or No in the Core and Tails III […]

July 23, 2018 at 5:18 am |

[…] data points I would need to separate a single binary decision. I wrote about this in Trapezoids, Yes or No in the Core and Tails III, and the earlier posts … II, and … […]