## Archive for July, 2018

### Bias

July 23, 2018

Tonight, a tweet from @CompSciFact led me to a webpage, “Efficiently Generating a Number in a Range.” In a subsection titled “Classic Modulo (Biased),” the author mentions how the not generating the entire base of the binary tree when seeking a particular range makes the random number biased. I came across this but didn’t have a word for it when I was trying to see how many data points I would need to separate a single binary decision. I wrote about this in TrapezoidsYes or No in the Core and Tails III, and the earlier posts … II, and … I.

When I wrote Yes or No in the Core and Tails III, the variance in the was obvious in the diagram on minimization in machine learning, but the bias was not. I had thought all along that not filling the entire tree should have made the distribution skewed and kurtotic. But the threshold to having a normal distribution is so big, 211, that we are effectively dividing the skew and kurtosis numbers by 11, or more generally by the number of tiers in the binary tree. That makes the skew and kurtosis negligible. So we are talking about 248/2048=0.1211.

Enjoy.

### Sandwiches

July 20, 2018

Joshua Rothman’s “Are Things Getting Better or Worse” talks about an interesting reality of human perception. Things get better, but we don’t see it. Better happens on the scales larger than the individual. Worse happens on the smaller scale of the individual. We have to reach to see that better.

The article mentioned the statistical view of normal distributions with their thin tails as constants contrasting them with thick tails as underestimated surprises. Yes, once a distribution achieves normality slightly south of n = 2 11 data point where skew is gone, and excess kurtosis is gone as well, surprise is slow and resisted. A normal distribution becomes a Cauchy, aka a thick-tailed, distribution when some epsilon asserts itself under the normal when some logic erodes, or some new logic is birthed when as a new subgraph inserts itself in the graph defining the undermined normal.

Rothman went on to mention the population bomb whose explosion we managed to defuse. He frames it as a debate, as A vs B, as in A XOR B, two rhetorically mutually exclusive outcomes, Borlaug’s and Vogt’s, except that they were simultaneous and independent. The world decided to do both. The world adopted both.

The underlying beliefs required the adoption of Borlaug’s greening and agricultural innovative technologies and simultaneously adopting Vogt’s population control mechanisms, which beyond China turned out to be the spread of prosperity. The opposing adoptions involved two categories each with their own technology adoption lifecycles (TALC). The innovations exploded outward from the problem they resolved.

In the figure above, I made no determinations as to what phases the technologies were in. Those technologies are commodities now. And, the wins were determined after the fact, long after the problem was addressed. Realize that there are n dimensions to the problem and some m < n dimensions, fewer, technologies being adopted to address the problem.

That mutually exclusive framing struck a chord with me. That XOR sits between two things, the meat between two pieces of bread, aka a sandwich.

Sandwiches turn out to be typical of mathematics. Ranges like 0 < 3x + 5y < 187 are sandwiches. Once a mathematician finds one such object, the next mission is to delineate an extent. For a biologist, finding a previously uncataloged squirrel is the existence moment. The next question is how many of them are there and where do they live which resolve into a collection of ranges. In the technology adoption lifecycle, a phenomenon organized by the pragmatism of the underlying populations, again we see ranges. And ranges are sandwiches. A value chain or an ecology is a collection of sandwiches. Is it in or out of the meat of the matter?

The immediate example of a sandwich is linear algebra or more precisely linear programming. There can any number of constraints operating on a given problem. The solutions to the problems are the areas bounded by the collection of constraints, each constraint being a linear equation involving inequalities.

Every constraint has its own technology adoption lifecycle. It might be that a constraint is completely new or discontinuous. More typically, a constraint will be moved by continuous innovation or normal science. As an area is defined by any number of constraints, we have numerous dimensions in which to innovate.

Enjoy.

### Trapezoids

July 16, 2018

I work up this morning with trapezoids on my mind. What the heck? I’ll be using them to represent generative adversarial networks (GANs). The input for a GAN is the output of another neural network. GANS take that output and minimize the number of incorrect findings in that output.

We’ll get there through the triangle model. A triangle represents a decision tree. Back in the waterfall, you started with the requirements phase. Then, you took the requirements into the design phase where you traded off enablers and constraints against the requirements. This got you an architecture. From there you wrote the functions, did the unit testing, then it was shipped to the testing department. Yes, we don’t do that these days. All of those phases fit into one triangle.

So I started this thing off a long way from the triangle model, traversed many triangles, and ended up with a trapezoid before I got to a GAN. And, I finished with several GANs. I end with a few notes on “don’t cares.”

A triangle starts somewhere, anywhere, well, where you are. It starts with one point, the origin. That point has to be someplace in space and time. That point has to be someplace in logic. That point has to be someplace in a set of assertions. Those assertions start somewhere and end somewhere else. That point is in a place, a place full of assertions. The circle represents the extent of the place, the extent of the assertions.

In the linear programming sense, a place can be an area defined by a set of constraints defined as a collection of inequalities.  Research in all domains attempts to move or break the constraints limiting our ability to get things done. Once a constraint is broken, we can do something we could not do before, or do it someplace we couldn’t do it before. Once a constraint breaks, we discover new constraints that define the extent of the new area. Infinity or finiteness limit us.

So here we are in our place looking out from the center of our world to some distant destination. We see a path. We wonder how far it is from here to there. We propose a solution. We propose a definition. We give the line from here to there, a distance. But, we’ve defined it with things unknown in our place. The term we are defining is not fair and balanced. It is asymmetrical, so we have to learn more. We have to keep trying to find a definition more symmetrical than what we now have.

Notice that we have a triangle formed by the black realization line and the redlines delineating the extent of the decision tree. The definition is a decision tree that is expressed in a generative grammar and built from the edge of the outer circle to the line exhibiting some distance.

Alas, the definition must be within our place. The decision tree must change shape. So with the realization line as the base of the triangle, we change our definition of distance until it is entirely inside our place. We change our definition until it is symmetric. We conducted experiments by adding, subtracting, and changing our assertions. We worked outward from the orign in a top-down manner until we reached our goal.

The learning implied by the original asymmetry and completed once we achieved symmetry moved us from one definition through a series of additional definitions and finally arriving at a better definition. When we moved from this better definition, we became asymmetric again. All of this took time. We learn at different rates. Some learned it faster, the thin line at the bottom of the surface. We planned on it taking a certain amount of time to learn, the thick line. And, some took longer, the thin line at the top of the surface. Each learner traversing different distances at different rates.

A game can be described as a triangle. The game tree begins at some origin and the game space, where the game is played, the game tree, explodes generatively outward only to encounter the constraints. Further play focuses towards the eventual win or loss. Here I’ve illustrated a point win.

This game is one of sequential moves. Before a game can be played, the rules and the board must be defined. The rules define moves applied generatively, and constraints that filter moves and defines wins and losses.

A game can also have a line solution, rather than a point solution. Chess is a game with a point solution representing a checkmate. There are other situations like a draw, so chess has a line solution that includes all the alternatives to continued play. While I’ve drawn this line as a continuous line, it could be represented by a collection of intervals occurring at different times.

Here the notion of assertions having a distance let me define some distances from the origin. I’ve called this the assertional radii. Each individual assertion has a distance of one, so six assertions would give us an assertional radius of six. Six would be the maximum distance. If two of those six assertions are used to build an assertion that ANDs those two assertions, one assertion would be subtracted from the six. In the figure, we have two AND assertions done in such a manner as to eliminate two assertions so the assertional radii of that branch of the tree would be two less than the maximum.

The brown area represents losses; the white area, wins; the yellow area, prohibited play, aka cheating.

So we’ll leave games now.

The triangle model has at times confused me. Which way does it grow? In the waterfall, it grew from requirements to the interface, and use beyond that. In Yes or No in the Core and Tails III, ontologies grew outward from the root to the base, and taxonomies grew from the base to the root. Ontology works towards realization. Taxonomy works off of the realization.

The symmetry in this figure is accidental.

Neural nets work from the examples of realizations. Neural nets work from the base to either a point solution or a line solution. Here the weights are adjusted to generate the line solution or the point solution. Point solutions can be viewed as a time series. In both solutions, we are given a sequence of decisions with varying degrees of correctness. These sequences are the outputs of the machine learning exercise. Line solutions give us trapezoids.

Generative adversarial networks (GANs) are a recent development in machine learning. They classify the outputs of a neural net and try to improve upon them. The red and blue trapezoids generate performance improvements over the performance of the initial neural net, shown in black. The GANs are dependent on the initial neural net. The GANs are independent of each other. Building a hand recognizer on top of an arm recognizer is one example of an application of a GAN.

So I’ll end this discussion of GANs with a graphical notation of GANs. The above illustrations of GANs can be simplified to the following figure.

## Notes on Don’t Cares

Here I’ll expand on the discussion of don’t cares in Yes or No in the Core and Tails III

Twitter had me Googling for the  Area Model. Later while I drew up the assertional radius idea, it became clear to me that ANDing reduces the assertional radius. When you just OR the assertions into a long chain you get the maximum radius. ANDings generate a lesser distance. By setting that maximum distance as the bottom of the decision tree, the shorter distances make up the difference in the branching of the binary tree by replacing assertions with don’t cares.

Later in the day, shortcut math, aka multiplying a long sequence of factors with one that is zero, means every nomial other than the one that is zero becomes a don’t care.

How does today’s post tie into product management?

Design has many definitions. I’d go with an activity that is judged by some critical framework. Different disciplines use different critical frameworks. GANs are how you apply a critical framework to the output of a neural net. GANs can be stacked on top of each other to any depth. Many GANs can be applied to the same output of a neural net.

Earlier in the week, I got into a discussion with a UI designer that was insisting that simple was best. I was saying that different points in the technology adoption lifecycle require different degrees of simplification and complexity. Yes, late mainstreet or later requires simplicity but I’ve found much simplicity just moving from functionality type programming to web pages, and from web pages to devices, and devices to the cloud. Form factors force simplicity. Complications here arise when the form factor gets in the way of the work. Anyway, Simplicity is apparently an ideology. We couldn’t discuss the issue. It was absolute. Fitness to use and fitness to the user, particularly, the current user or the next pragmatism slice through our prospects matters more than absolute simplicity.

During Web 1.0, we were selling consumer goods to geeks. Geez. If it gets too simple and the users are geeks, you’ve made a mistake, a huge mistake. Even geeks make mistakes when we discuss some new machine learning tool that simplifies the effort to apply that technology because soon enough it will be too simple to make any money doing it.

Asymmetries mean that learning is required. Learning rates differ in a population gradient. Know how much the user is going to have to learn in every release. Is that negative use cost going to be spent by your users?

Enjoy!

### Yes or No in the Core and Tails III

July 2, 2018

So the whole mess that I mentioned in Yes or No in the Core and Tails II, kept bothering me. Then I realized that the order of the decisions didn’t matter. I can move the don’t cares to the bottom of my tree. It took a while to revise the tree. In the meantime, I read Part 2 of the Visual Introduction to Machine Learning, which led me to believe that moving the don’t cares was the correct thing to do.

The figure too small to see. But it is a complete binary tree of size 211, which takes us to 2048 bits, or a sample size of n=2048. Notice that we achieve normality at n=1800. This situation should present us with a skewed normal, but somehow the distribution is not skewed according to  John Cooks binary outcome sample size calculator. Of course, I’m taking his normality to mean standard normal. Those five layers of don’t cares give us some probability of 1/32, or p = 0.03125 at each branch at 26. Or, taking using the number from higher density portion of the tree, 1800/2048 = 0.8789, or the number from the from the lower density portion of the tree, 248/2048 = 0.1210. No, I’m not going to calculate the kurtosis. I’ll take John’s normal to be a standard normal.

The neural net lesson taught a nice lesson summed up by the figure about bias and variance. Yes, we are not doing machine learning, but another term for the same thing is statistical learning. We have the same problems with the statistical models we build for ourselves. We have bias and variance in our data depending on how we define our model, aka what correlations we use to define our model.

Model complexity is indirectly related to bias. And, model complexity is directly related to variance. Part 2 of the Visual Introduction to Machine Learning explains this in more depth if you haven’t read it yet.

Watch the zedstatistics series on correlation. It will take some time to see how his models changed their definitions over the modeling effort. He is seeking that minimum error optimization shown in the figure. Much of it involves math, rather than data.

Given that we have pushed our don’t cares down below our cares, we set ourselves up in a sort of Cauchy distribution. Cauchy distributions have thicker tails than normals as shown in the normal on the right. In some sense, the tail thickness is set by moving the
x-axis of the normal down. Here we did that by some epsilon. In a marketing sense, that would be an upmarket move without renormalization. But, in our “don’t care” sense the don’t cares are defining the thickness of that epsilon.

With normal distribution shown on the right, we are defining our known as what we got from our sample, our soft of known as the space of the don’t cares, and our unknowns as the yet to be surveyed populations. The soft of knowns represent our tradeoffs. We had to choose a path through the subtree, so we had to ignore other paths through the subtree. There were 32 paths or 25 paths of the 211 paths. Keep in mind that the don’t cares don’t mean we don’t care. Don’t cares allow us to solve a problem with a more general approach, which we usually take to minimize costs. But, in the marketing sense, it’s more that we didn’t ask yet. Once we ask and get a firm determination, we firm up one path from the 32 possible paths. We can use don’t cares to move forward before we have a definitive answer.

But, the bias and variance figure tells us something else. It tells us where in the machine learning sense the ideal solution happens to be. It is at the minimum of a parabola. In the frequentist sense, that minimum defines a specific standard deviation, or in the approach to the normal sense, that minimum tells us where our sample has become normal. It also tells us where we have become insensitive to outliers.

Once we have found the minimum, we have to realize that minimum in the development or definitional effort. Agilists would stop when they reach that minimum. Would they realize that they reached it? That is another matter. Ask if they achieved normality or not. But, the goal of machine learning is to approximate a solution with limited data, or approximating the parabola with a limited number of points on the parabola. Once you’ve approximated the parabola, finding the minimum is a mathematical exercise.

We can represent the product as a line through that minimum. That line would represent the base of a decision tree. I’ve represented these decision trees as triangles. Those triangles being idealizations. A generative effort in a constraint space is much messier than a triangle would suggest.

I’ve annotated the bias and variance graph with such a line. I’ve used a straight line to represent the realization. Every realization has an ontology representing the conceptualization to be realized. Every realization also has a taxonomy, but only after the realization. It boils down to ontologies before and taxonomies after. In the figure, the line from the minimum error to the baseline of the bias and variance graph is the target of the development effort. The realization line was projected and redrawn to the right. Then, the ontology and the taxonomy were added. Here the ontology and the taxonomy are identical. That is far from reality. The ontology and the taxonomy are symmetrical here, again far from reality.

The figure below the one on the right shows a messier view of a realization to be achieved over muliple releases. The solid red line has been released. There is an overall taxonomy, the enterprise taxonomy. And, there is the taxonomy of the user. The user’s effort generates some value that is significant enough to warrant continued development of the intended realization shown as red striped line. The user’s taxonomy is limited to the user’s knowledge of the carried content. The user’s knowledge might need to be enhanced with some training on the underlying concept. The user may not know the underlying conceptual model defined in the ontology. The developers might not know the underlying conceptual model either.

We cannot feed an ontology to a neural network. And, that neural network won’t discover that ontology. When Google wrote that Go playing application, it discovered a way to play Go, that no humans would have discovered. There are more ways to get to a realization than through ontologies and taxonomies.

The value of a realization is achieved by projecting effort through the realization. That value is evaluated relative to a point of value. That value is evaluated by some valuation baseline. Different managers in an enterprise would have different views of the realization, and different valuation baselines.

The symmetries, asymmetries, and axes of those symmetries that I highlighted are significant indicators of what must be learned, and who must learn what is being taught. Value realization is tied to what must be taught. The need to teach like the need to design interfaces are signals that underlying ontology was not known to the users, and not known and subsequently learned by the developers. The need to teach and design shows up more in products designed for sale or external use.

So what is a product manager to do? Realize that the number of samples is much larger than what Cook’s formula tells us the minimum number of samples would be. Don’t cares are useful minimizations. There is one ontology and many taxonomies. Agile assumes that the ontology will be discovered by the developer. When the UI is not straightforward, the ontology has been departed from. And, there are many views of value and many valuation baselines.

Enjoy.