Archive for the ‘Uncategorized’ Category

Customer Feedback

October 14, 2018

I’m old, ancient, pre-internet. The thing the internet does with server logs was something we did without server logs. It was a new practice at the time to track every interaction with a touchpoint in a database, and associate that interaction with a person. But, there were CEOs and others that would tell their marketer that they didn’t need a database. Geez. Then, the internet came along and much was forgotten. Everybody’s careers depended on getting on the internet and keeping up with the rapidly moving practices in the marketer’s silo.

Eventually, SEO got away from most of us and ended up in the hands of SEO experts. And, in the race, print marketing communications went out of style, and many printers went out of business. Paper was weight, weight was expensive, and cost justifications that server logs gave the SEO crowd were nonexistent for print. It wasn’t the internet that disrupted print, it was server logs. Fixing this disruption should have been easy, but the print industry wasn’t listening. We already had the technology. Oh, well.

These days, product managers talk about talking to the customer. Really? What customer? Are we talking prospects, users, managers of users, managers of functional units, or managers of business units? The technology adoption lifecycle defines the person we are selling to differently in different phases. Alternate monetizations drive another set of customer definitions. So who the hell are we supposed to talk to?

With SAAS-based applications, every click can be analyzed via SEO methods to generate a long tail of feature use. We can associate these tails with users and customers across all of our definitions. We could know what the heck as soon as we had a normal distribution for a particular click. Sorry, Agilist, but you don’t have enough data. Much would be seen in the changes to those long tails.

In the earlier, every touchpoint captured data, we can watch prospects mature as we and they crossed the normal distribution of our technology adoption lifecycle. We can watch their onboarding, their use, their learning, their development of their expertise, and their loyalty effects.

Today, I checked out a book that was mentioned on a blog that was mentioned in my tweeter stream. Amazon showed the customer reviews in an interesting manner that I’d Amazon Customer Reviews Breakdownnot seen before. It broke down the average score into the contributing levels of satisfaction. As it is, this is great for retention efforts, and social networking efforts. It would also be useful in our presale market communications efforts. Prospects are ready to buy only when they reach a 5-star rating across all the marketing communications they’ve touched across the entire buying team.

It would be great in our efforts to develop new features, aka use cases, user stories, and other such efforts. We could push this further by capturing negative reviews, which when tied to the application’s long tail and the individual customer would tell us what we needed to do to retain the customer across all definitions of the customer. If a customer that gave us rave reviews suddenly isn’t, it wouldn’t be sudden if we were paying attention, and it wouldn’t have to end with a retention effort. There is a long tail of customers, not just SKUs. In a software application, every feature is an SKU.

All of this would require an infrastructure that more widely defined what we captured in our server log and what analytic equivalence would look like in all these uses beyond SEM.

In the adoption lifecycle, we could break down the clicks from every pragmatism slice. That would tell us how soon a given pragmatism slice would be ready to buy, and that would inform the marketing communications effort and the feature development effort. We’d know what that pragmatism slice wanted and when. We’d know how well our marketing communications is working for that slice. It would greatly inform our tradeoffs.

One last thing, customers don’t know what we need them to know, so they can’t tell us about the future. Without good definitions of the generic customer, we could be talking to the wrong customer and addressing the wrong issues. We could be taking value to a particular “customer” that would never care about the delivery of that value.

Enjoy.

Advertisements

We are never far from the exception

October 9, 2018

I’ve played around with a new compass this weekend. Out on Twitter, Antonio Gutierrez posted is Geometry Problem 1392, Two Cyclic Quadrilateral, Cyclic Octagon, Circle. The figure looked interesting enough, so I drew a few and found them impossible. Hint, draw the circles first. I drew the quadrilaterals first. A funny thing happened on the way to the barn.

Yes, there is a process for inscribing a quadrilateral. The process works. But, more often than not, it does not work.  But, just for the fun of it. Draw a quadrilateral and only the quadrilateral. Draw it on a blank sheet of paper.

The procedure is to find the midpoints of the segments. Then, draw lines from the opposite midpoints. The intersection of those two lines would be the center of the circle inscribing the quadrilateral. Try it!

Theoretically, of course.

Because you tried it and failed. Keep trying. Keep failing. Call it risky. Conclude that inscribing a quadrilateral is risky. Then, go forward with something simpler, something more achievable, something more amenable to process, something simpler to invest in.

But there is a cheat. Draw the circle and then draw the quadrilateral with the vertexes on the circle. Yes, you end up with a perfect inscribed quadrilateral. You are in Euclidean space.

In those earlier attempts, you were venturing into an unknown geometric space. But, as a manager who knows their numbers, a manager that cranks out numbers and decisions in L2, in the geometric space of the spreadsheet, you only travel in L2. A safe assumption is that L2 is Euclidean although I’ve seen the Euclidean metric being dropped into a spreadsheet. This was done to promote the Euclidean space from an assumption to an explicit assertion.

I’ll assume that I cannot inscribe a quadrilateral in L2.

I was doing this on paper. I’d need a vector graphics-based drawing application to show you. I’ll give it a shot in MS Paint. We need to see it. And, you probably don’t have a compass in that pocket protector you don’t wear outside the house.

First, I’ll draw the cheat, the circle, the root of all things Euclidean–Well, 00 Circlea lot of Euclidean things. Circles were ideal. Mathematicians love their ideals because they make the calculations simple.

Back in our high-school geometry class, we were told that there are 360 degrees around a circle. If you didn’t believe this, oh, well, you flunked, and you didn’t end up in b-school. Maybe you became an artist instead.

Then we were told that any inscribed triangle with a base passing through the center, 01 Circleaka on the diameter would have a right angle opposite that base. Starting with a circle, this is easy. Starting with an arbitrary right triangle, rather than a circle, means we have to know a lot more before we can inscribe that right triangle. It’s still simple, but it’s more complicated to start with the triangle. And, if it isn’t a right triangle, we can still inscribe it, but being on the diagonal is another matter.

So we have a situation that lets us talk about 360 degrees and another that lets us talk about 90 degrees.

Then, we ditched the circle with the triangle postulate, which told us 02 Trianglethat a triangle’s angles add up to 180 degrees. We, again, were forced to believe this or face grave consequences.

03 TriangleWe went so far as to ditch the right angle as well. But, get 04 Trianglethis, then, they forced trigonometry on us. Oh, we also got rid of that diagonal as well. So we end up with two more kinds of angles: acute and obtuse. Still, the process to draw a circle around a triangle worked, circumscribing a triangle. But, we then faced an explosion in the kinds of centers.

Did you catch that terminology change? When we start with a circle we inscribe the 05 Inscribe vs Circumscribetriangle. When we start with the triangle we circumscribe that triangle. Two different situations. Maybe AutoCAD makes that clear, I don’t know. It matters. It just doesn’t matter yet, so expect an Agilist to jump on the elementary, near one and object to the implementing the further one. Oh, well, we can refactor later.

That explosion of centers illustrates a concept that translates well to the base of a normal distribution. The centers: the centroid, circumcenter, and incenter show up in different orderings for different triangles. For equilateral triangles, the centers are one and the same point. When the normal distribution is indeed normal, all three statistical centers, the mean, median, and mode, show up at the same point.

Moving up to quadrilaterals, we add a point and a line and start to run through the geometries of four-sided things. We can circumscribe squares, rectangles, and all the “pure” forms, the ideals again. But, the defects have us skewered on the fork at the question of the circle first or later.

So we’ll walk through that argument next starting with an inscribed cyclical quadrilateral, we’ll draw the circle first, aka using the Euclidean cheat.

“If you’re given a convex quadrilateral, a circle can be circumscribed about it if and only if the quadrilateral is cyclic.” That quote from Stack Exchange in a Google search. Yes, a cyclic quadrilateral is where started us off with trying to circumscribe06 Inscribed Quadrilateral a quadrilateral. By assuming it was cyclic, we got it done, but we set ourselves up with the needle from the haystack, so we didn’t have to find the needle. Always bring your own needle to a haystack. That needle is Euclidean space. We assume that if and only if rule when we start with a circle. That center is the center of the circle. That center is not the center of the circle.

We’ll look at the intersection of the diagonals. But, before we get there, 07 Inscribed Quadrilateral Center NOT.pngnotice that none of the angles are right angles, and none of the lines are on a diagonal. They would have to include the center of the circle, that black dot. We’ve already departed Euclidean space.

The intersection of the diagonals is not the correct procedure for finding 08 Inscribed Quadrilateral Center NOTthe center of the circumscribing circle. The process I referred to earlier was one of finding the midpoints of the line and joining the midpoints of the opposites side. I did it in MS Paint. That all by itself introduces errors. But, by in large, the error is larger than the error of MS Paint’s bitmap resolution, aka the error of quantification. We got closer, but still wrong and obviously so.

The math of the thing assumes that the 09 Inscribed Quadrilateral Center NOTcircumscribed circle would be in Euclidean space, so opposing angles, those connected by diagonals in the earlier figure, would add up to 180 degrees. I don’t know the number of degrees, but we can still add them up visually.

The area in question is a matter of is it equal to the area marked c. If we stick with a Euclidean space, the two have to add up to 360 degrees.

Cyclic means that opposing angles add up to 180 degrees. Not being cyclic means we cannot circumscribe the quadrilateral in Euclidean space. Not adding up to 180 degrees means we are short a few degrees, aka we are in hyperbolic space, or we are over a few degrees, aka we are in spherical space. We’ve seen this before with our evolution to the normal distribution.

Keep in mind that we are just drawing. We didn’t measure. We didn’t pull out a protractor. We are not running around trying to add up degrees and seconds. We are not navigators trapped in Mercator geometry. Oh, projection!

As product managers, we project profits in L2, and that is why we never try to innovate discontinuously. We are projecting in L2, but the underlying space is not Euclidean, so those nice numbers don’t tell us the truth. The underlying space is almost never Euclidean.

But, the point of this post is to find out if space is hyperbolic or spherical. If the sum of the opposite angles on B where more, then we are hyperbolic in one dimension and spherical in the other. Then, we have to know where the transition happens. Then, we would like to know what the rates of transition would be.

We see the same thing in our data collection towards a normal. We are asking the same questions. There we can see our skew and kurtosis and our rates of data collection. I’m not p-hunting. I’m hunting for my normal. I’m hunting for my short tail so I can invest with some expectations about risk, about stability. Be warned there are plenty of short tails in sense of belief functions in fuzzy logic, in the swamped by surface sense. There is a geologic like structure to probability densities, but we hide those from ourselves with our dataset practice, a practice about preventing p-hunting. We are not p-hunting. We are looking for investments hiding in “bad” numbers, numbers that appear bad because we insist on L2, thus we insist on the 10th-grade Euclidean space of high school. Nowadays, even that space is not strictly Euclidean.

I’ve hinted at heterogeneous spaces. Trying to circumscribe a freely drawn quadrilateral reveals how space transitions to and from geometries generating homogeneous spaces.

“Because in reality there does not exist isolated homogeneous spaces, but a mixture of them, interconnected, and each having a …”

a quote in the google search results citing  An Introduction to the Smarandche Geometries, by L. Kuciuk and M. Antholy.

We do business in those spaces, spaces where the ideal, the generic, are fictions. Discontinuous innovation happens in hyperbolic space. Continuous innovation happens in Euclidean and spherical spaces with the spherical being the safest bet. And, that hyperbolic space being the riskiest. We no longer invest in discontinuous innovation because we believe it is risky. It appears that such investments would offer little return because in hyperbolic space the future looks smaller than it will be.

And, in the sense of running a business, “One geometry cannot be more valid than another; it can only be more convenient.”Henri Poincaré (1854 – 1912), Science and Hypothesis (1901).

10 Non-cyclic QuadrilateralSo at last, we will encircle an indigenous quadrilateral in the wild. This is probably not a good example since it is concave. The circles generate a conic, but the points of the quadrilateral that are on the circles hint at a more complex shape. The circles hint that the geometry changes when we move from one circle to another. The smallest circle give us three spherical geometries; the largest circle, only one. The smallest circle gives us no hyperbolic geometries; the largest circle, two hyperbolic geometries.

Given that most companies work in late mainstreet adoption phase selling commodities from within a spherical space, we rarely dip into the hyperbolic space, except when we undertake a sampling effort towards the normal. In that effort, we might jump the gun and infer before we are back in Euclidean or our more native spherical space. So much for that inference. It will be fleeting. Know where you are Euclidean. Is it a normal with three sigma, or a normal with six sigma? It isn’t a normal with twenty sigmas. Your choice. It is not Euclidean when it is not yet normal, aka when it is skewed and kurtotic.

It makes me wonder if the reproduction crisis is an artifact of inferring too early. A statistician out on Stack Exchange insisted that you can infer with kurtotic and skewed distributions. Not me. Is the inference fleeting, a throwdown, or a classroom exercise?

Anyway, back to those rates, those differentials, which 11 Becoming Euclideantakes us to two more diagrams. The first one shows us what happens as 12 Becoming Sphericalwe achieve the Euclidean, aka the cyclic; the next one, we achieve the spherical.

Enough already. Thanks. And, enjoy!

 

 

The Direction of Learning

September 13, 2018

In my recent reading, I came across “When Bayes, Ockham, and Shannon come together to define machine learning.” That led me to this figure. Not AnnotatedThis figure adds to the bias and variance interactions that I wrote about in Bias. That post was extending the notion of what it took to be normal. That normality brought with it symmetry. Symmetry happens when you have learned what there was to learn. Asymmetry calls out demanding learning.

In the above figure, I annotated several events. Gradient descent brought the system to the optimum. Much like a jar of peanut butter getting on the shelf of your grocery store, there was Annotatedmuch involved in achieving that optimum. Call that achievement an event.

Here I’ve annotated the intersections as events. On one side of the intersection, the world works one way, and on the other side, the world works another way. The phases of the technology adoption lifecycle are like that. Each phase is a world. In the figure here, all I can say is that the factors have changed their order and scale. These changes apply over an interval. That interval is traversed over time. Then, the next interval is traversed. Consider the intervals to be defined by their own logic. The transitions can be jarring. As to the meanings of these worlds, I’ll have to know more before I can let you know.

John Cook tweeted about a post on estimating the Poisson distribution from a normal. That’s backward in my thinking. You start collecting data, which initially gives you a Poisson distribution, then you approximate the normal long before normality is achieved. Anyway, Cook’s post led me to this post, “Normal approximation to logistic distribution”And, again we are approximating the logistic distribution with a normal. I took his figure used it to summarize the effects of changes to the standard deviation, aka the variance.

Normal Distribution Approximation of the Logistic Distribution

The orange circles are not accurate. They represent the extrinsic curvature of the tail. The circle on the right should be slightly larger than the circle on the left. The curvature is the inverse of the radius. The standard deviations are 1.8 for the approximating normal on the left, and 1.6 for the approximating normal on the left. The logistic distribution is the same in both figures.

On the left, the approximation is loose and leaves a large area at the top between the logistic distribution and the approximating normal. As the standard deviation is decreased to the optimal of 1.6, that area is filled with some probability mass that migrated from the tails. That changes the shape of the tail. I do not have the means to accurately compute the tails accurately so I won’t speak to that issue. I draw to discover things.

The logistic distribution is symmetric. And, the normal that Cook is using is likewise symmetric. We are computing these distributions based on their formulas, not on data collection over time. In my earlier discussions of kurtosis, we know that while data is being collected over time, kurtosis goes to zero. That gives us these ideal distributions, but in the approximation process much is assumed. Usually, distributions are built around the assumptions of a mean of zero and a standard distribution of one. I came across a generalization of the normal that used skew as a parameter.

It turns out that the logistic distribution is subject to a similar generalization. In this generalization, skew, or the third moment is used as a parameter. These generalizations allow us to use the distributions in the absence of data.

Skew brings kurtosis with it.

In the first article cited in this post, the one that mentions Bayes, a Bayesian inference is seen as a series of distributions that arrive at a truth in a Douglas MacArthur island hopping exercise, or playing a game of Go where the intersections are distributions. It’s all dynamic and differential, rather than static in the dataset view that was created to prevent p-hunting, yet p-hunting has become the practice.

So these generalizations found skew to be an important departure from the ungeneralized forms. So we can look at these kurtotic forms of the logistic distribution.

Generalized Logistic Distribution

Here shown in black we can see the ungeneralized form of the logistic distribution. It has two parameters: the mean, and the standard distribution. The generalization adds a third parameter, skew. The red distribution has a fractional skew that is less than one. The blue distribution has a skew greater than one. Kurtosis is multiplicative in this distribution. The kurtosis orients the red and blue distributions via their long and short tails. Having a long tail and a short tail is the visual characteristic of kurtosis. Kurtotic distributions are not symmetrical.

Kurtosis also orients the normal. This is true of both the normal and the generalized skew-normal. In the former, kurtosis is generated by the data. In the latter, kurtosis is generated by the specification of the skew parameter. The latter assumes much.

It would be interesting to watch a skew-normal distribution approximate a skew-logistic distribution.

The three distributions in the last figure illustrate the directionality of the kurtosis. This kurtosis is that of a single dimension. When considered in the sense of an asymmetrical distribution attempting to achieve symmetry, there is a direction of learning, the direction the distribution must move to achieve symmetry.

We make inferences based on the tails involved. Over time the long tail contracts and the short tail lengthens. Statisticians argue that you can infer with kurtotic distributions. I don’t know that I would. I’d bet on the short tails. The long tails will invalidate themselves as more data is collected. The short tails will be constant over the eventual maturity, the differential achievement of symmetry, or the learning of the distribution.

This learning can be achieved when product developers learn the content of their product and make it fit the cognitive models of their users, or when marcom, training, and documentation enable users to learn the product, and lastly, changing the population so its members more closely fit the idealized population served by the product. All three of these learnings happen simultaneously, and optimally without conflict. Each undertaking would require data collection. And, the shape of the distribution of that data would inform us as to our achievement of symmetry, or the success and failure of our efforts.

The technology adoption lifecycle informs us as to the phase, or our interval and its underlying logic. That lifecycle can move us away from symmetry. We have to learn back our symmetry. The pragmatism that organizes that lifecycle also has effects within a phase. This leaves us in a situation where our prospects are unlike our customers or installed users. Learning is constant so divergence from symmetry is also constant. We cannot be our pasts. We must be our present. That is hard to achieve given the lagging indications of our distributions.

Enjoy!

Bias

July 23, 2018

Tonight, a tweet from @CompSciFact led me to a webpage, “Efficiently Generating a Number in a Range.” In a subsection titled “Classic Modulo (Biased),” the author mentions how the not generating the entire base of the binary tree when seeking a particular range makes the random number biased. I came across this but didn’t have a word for it when I was trying to see how many data points I would need to separate a single binary decision. I wrote about this in TrapezoidsYes or No in the Core and Tails III, and the earlier posts … II, and … I.

When I wrote Yes or No in the Core and Tails III, the variance in the was obvious in the diagram on minimization in machine learning, but the bias was not. I had thought all along that not filling the entire tree should have made the distribution skewed and Bias in a Normalkurtotic. But the threshold to having a normal distribution is so big, 211, that we are effectively dividing the skew and kurtosis numbers by 11, or more generally by the number of tiers in the binary tree. That makes the skew and kurtosis negligible. So we are talking about 248/2048=0.1211.

Enjoy.

 

 

 

Sandwiches

July 20, 2018

Joshua Rothman’s “Are Things Getting Better or Worse” talks about an interesting reality of human perception. Things get better, but we don’t see it. Better happens on the scales larger than the individual. Worse happens on the smaller scale of the individual. We have to reach to see that better.

The article mentioned the statistical view of normal distributions with their thin tails as constants contrasting them with thick tails as underestimated surprises. Yes, once a distribution achieves normality slightly south of n = 2 11 data point where skew is gone, and excess kurtosis is gone as well, surprise is slow and resisted. A normal distribution becomes a Cauchy, aka a thick-tailed, distribution when some epsilon asserts itself under the normal when some logic erodes, or some new logic is birthed when as a new subgraph inserts itself in the graph defining the undermined normal.

Rothman went on to mention the population bomb whose explosion we managed to defuse. He frames it as a debate, as A vs B, as in A XOR B, two rhetorically mutually exclusive outcomes, Borlaug’s and Vogt’s, except that they were simultaneous and independent. The world decided to do both. The world adopted both.

Simultaneous Adoptions

The underlying beliefs required the adoption of Borlaug’s greening and agricultural innovative technologies and simultaneously adopting Vogt’s population control mechanisms, which beyond China turned out to be the spread of prosperity. The opposing adoptions involved two categories each with their own technology adoption lifecycles (TALC). The innovations exploded outward from the problem they resolved.

In the figure above, I made no determinations as to what phases the technologies were in. Those technologies are commodities now. And, the wins were determined after the fact, long after the problem was addressed. Realize that there are n dimensions to the problem and some m < n dimensions, fewer, technologies being adopted to address the problem.

That mutually exclusive framing struck a chord with me. That XOR sits between two things, the meat between two pieces of bread, aka a sandwich.

02 Sandwich

Sandwiches turn out to be typical of mathematics. Ranges like 0 < 3x + 5y < 187 are sandwiches. Once a mathematician finds one such object, the next mission is to delineate an extent. For a biologist, finding a previously uncataloged squirrel is the existence moment. The next question is how many of them are there and where do they live which resolve into a collection of ranges. In the technology adoption lifecycle, a phenomenon organized by the pragmatism of the underlying populations, again we see ranges. And ranges are sandwiches. A value chain or an ecology is a collection of sandwiches. Is it in or out of the meat of the matter?

The immediate example of a sandwich is linear algebra or more precisely linear programming. There can any number of constraints operating on a given problem. The solutions to the problems are the areas bounded by the collection of constraints, each constraint being a linear equation involving inequalities.

03 Linear Programming

Every constraint has its own technology adoption lifecycle. It might be that a constraint is completely new or discontinuous. More typically, a constraint will be moved by continuous innovation or normal science. As an area is defined by any number of constraints, we have numerous dimensions in which to innovate.

Enjoy.

 

Trapezoids

July 16, 2018

I work up this morning with trapezoids on my mind. What the heck? I’ll be using them to represent generative adversarial networks (GANs). The input for a GAN is the output of another neural network. GANS take that output and minimize the number of incorrect findings in that output.

We’ll get there through the triangle model. A triangle represents a decision tree. Back in the waterfall, you started with the requirements phase. Then, you took the requirements into the design phase where you traded off enablers and constraints against the requirements. This got you an architecture. From there you wrote the functions, did the unit testing, then it was shipped to the testing department. Yes, we don’t do that these days. All of those phases fit into one triangle.

So I started this thing off a long way from the triangle model, traversed many triangles, and ended up with a trapezoid before I got to a GAN. And, I finished with several GANs. I end with a few notes on “don’t cares.”

01 PointA triangle starts somewhere, anywhere, well, 02 Placewhere you are. It starts with one point, the origin. That point has to be someplace in space and time. That point has to be someplace in logic. That point has to be someplace in a set of assertions. Those assertions start somewhere and end somewhere else. That point is in a place, a place full of assertions. The circle represents the extent of the place, the extent of the assertions.

03 LP View of PlaceIn the linear programming sense, a place can be an area defined by a set of constraints defined as a collection of inequalities.  Research in all domains attempts to move or break the constraints limiting our ability to get things done. Once a constraint is broken, we can do something we could not do before, or do it someplace we couldn’t do it before. Once a constraint breaks, we discover new constraints that define the extent of the new area. Infinity or finiteness limit us.

So here we are in our place looking out from the 04 Defining A Term - Asymmetriccenter of our world to some distant destination. We see a path. We wonder how far it is from here to there. We propose a solution. We propose a definition. We give the line from here to there, a distance. But, we’ve defined it with things unknown in our place. The term we are defining is not fair and balanced. It is asymmetrical, so we have to learn more. We have to keep trying to find a definition more symmetrical than what we now have.

Notice that we have a triangle formed by the black realization line and the redlines delineating the extent of the decision tree. The definition is a decision tree that is expressed in a generative grammar and built from the edge of the outer circle to the line exhibiting some distance.

05 Defining a Term - Achieving SymmetryAlas, the definition must be 06 Definitionwithin our place. The decision tree must change shape. So with the realization line as the base of the triangle, we change our definition of distance until it is entirely inside our place. We change our definition until it is symmetric. We conducted experiments by adding, subtracting, and changing our assertions. We worked outward from the orign in a top-down manner until we reached our goal.

07 Asymmetry as LearningThe learning implied by the original asymmetry and completed once we achieved symmetry moved us from one definition through a series of additional definitions and 08 Rate of Learnng for Various Populationsfinally arriving at a better definition. When we moved from this better definition, we became asymmetric again. All of this took time. We learn at different rates. Some learned it faster, the thin line at the bottom of the surface. We planned on it taking a certain amount of time to learn, the thick line. And, some took longer, the thin line at the top of the surface. Each learner traversing different distances at different rates.

09 GameA game can be described as a triangle. The game tree begins at some origin and the game space, where the game is played, the game tree, explodes generatively outward only to encounter the constraints. Further play focuses towards the eventual win or loss. Here I’ve illustrated a point win.

This game is one of sequential moves. Before a game can be played, the rules and the board must be defined. The rules define moves applied generatively, and constraints that filter moves and defines wins and losses. 10 Game

A game can also have a line solution, rather than a point solution. Chess is a game with a point solution representing a checkmate. There are other situations like a draw, so chess has a line solution that includes all the alternatives to continued play. While I’ve drawn this line as a continuous line, it could be represented by a collection of intervals occurring at different times.

11 Game - Losses

Here the notion of assertions having a distance let me define some distances from the origin. I’ve called this the assertional radii. Each individual assertion has a distance of one, so six assertions would give us an assertional radius of six. Six would be the maximum distance. If two of those six assertions are used to build an assertion that ANDs those two assertions, one assertion would be subtracted from the six. In the figure, we have two AND assertions done in such a manner as to eliminate two assertions so the assertional radii of that branch of the tree would be two less than the maximum.

12 Game - Assertional Grid

The brown area represents losses; the white area, wins; the yellow area, prohibited play, aka cheating.

So we’ll leave games now.

The triangle model has at times confused me. Which way does it grow? In the waterfall, 15 Ontology and Taxonomyit grew from requirements to the interface, and use beyond that. In Yes or No in the Core and Tails III, ontologies grew outward from the root to the base, and taxonomies grew from the base to the root. Ontology works towards realization. Taxonomy works off of the realization.

The symmetry in this figure is accidental.

Neural nets work from the examples of realizations. Neural nets work from the base to either a point solution or a line solution. Here the weights are adjusted 16 Neural Net with Line and Point Solutionsto generate the line solution or the point solution. Point solutions can be viewed as a time series. In both solutions, we are given a sequence of decisions with varying degrees of correctness. These sequences are the outputs of the machine learning exercise. Line solutions give us trapezoids.

Generative adversarial networks (GANs) are a recent development in machine learning. 17 GANsThey classify the outputs of a neural net and try to improve upon them. The red and blue trapezoids generate performance improvements over the performance of the initial neural net, shown in black. The GANs are dependent on the initial neural net. The GANs are independent of each other. Building a hand recognizer on top of an arm recognizer is one example of an application of a GAN.

So I’ll end this discussion of GANs with a graphical notation of GANs. 18 Trapezoids SummaryThe above illustrations of GANs can be simplified to the following figure.

 

 

 

Notes on Don’t Cares

20 Decision TreesHere I’ll expand on the discussion of don’t cares in Yes or No in the Core and Tails III

Twitter had me Googling for the  Area Model. Later while I drew up the assertional radius idea, it became clear to me that ANDing reduces the assertional radius. When you just OR the assertions into a long chain you get the maximum radius. ANDings generate a lesser distance. By setting that maximum distance as the bottom of the decision tree, the shorter distances make up the difference in the branching of the binary tree by replacing assertions with don’t cares.

Later in the day, shortcut math, aka multiplying a long sequence of factors with one that is zero, means every nomial other than the one that is zero becomes a don’t care.

19 Dont Cares in Math

 

 

How does today’s post tie into product management?

Design has many definitions. I’d go with an activity that is judged by some critical framework. Different disciplines use different critical frameworks. GANs are how you apply a critical framework to the output of a neural net. GANs can be stacked on top of each other to any depth. Many GANs can be applied to the same output of a neural net.

Earlier in the week, I got into a discussion with a UI designer that was insisting that simple was best. I was saying that different points in the technology adoption lifecycle require different degrees of simplification and complexity. Yes, late mainstreet or later requires simplicity but I’ve found much simplicity just moving from functionality type programming to web pages, and from web pages to devices, and devices to the cloud. Form factors force simplicity. Complications here arise when the form factor gets in the way of the work. Anyway, Simplicity is apparently an ideology. We couldn’t discuss the issue. It was absolute. Fitness to use and fitness to the user, particularly, the current user or the next pragmatism slice through our prospects matters more than absolute simplicity.

During Web 1.0, we were selling consumer goods to geeks. Geez. If it gets too simple and the users are geeks, you’ve made a mistake, a huge mistake. Even geeks make mistakes when we discuss some new machine learning tool that simplifies the effort to apply that technology because soon enough it will be too simple to make any money doing it.

Asymmetries mean that learning is required. Learning rates differ in a population gradient. Know how much the user is going to have to learn in every release. Is that negative use cost going to be spent by your users?

Enjoy!

 

 

 

 

 

 

Yes or No in the Core and Tails III

July 2, 2018

So the whole mess that I mentioned in Yes or No in the Core and Tails II, kept bothering me. Then I realized that the order of the decisions didn’t matter. I can move the don’t cares to the bottom of my tree. It took a while to revise the tree. In the meantime, I read Part 2 of the Visual Introduction to Machine Learning, which led me to believe that moving the don’t cares was the correct thing to do.

Decision Tree 3

The figure too small to see. But it is a complete binary tree of size 211, which takes us to 2048 bits, or a sample size of n=2048. Notice that we achieve normality at n=1800. This situation should present us with a skewed normal, but somehow the distribution is not skewed according to  John Cooks binary outcome sample size calculator. Of course, I’m taking his normality to mean standard normal. Those five layers of don’t cares give us some probability of 1/32, or p = 0.03125 at each branch at 26. Or, taking using the number from higher density portion of the tree, 1800/2048 = 0.8789, or the number from the from the lower density portion of the tree, 248/2048 = 0.1210. No, I’m not going to calculate the kurtosis. I’ll take John’s normal to be a standard normal.

The neural net lesson taught a nice lesson summed up by the figure Bias and Variance in MLabout bias and variance. Yes, we are not doing machine learning, but another term for the same thing is statistical learning. We have the same problems with the statistical models we build for ourselves. We have bias and variance in our data depending on how we define our model, aka what correlations we use to define our model.

Model complexity is indirectly related to bias. And, model complexity is directly related to variance. Part 2 of the Visual Introduction to Machine Learning explains this in more depth if you haven’t read it yet.

Watch the zedstatistics series on correlation. It will take some time to see how his models changed their definitions over the modeling effort. He is seeking that minimum error optimization shown in the figure. Much of it involves math, rather than data.

Given that we have pushed our don’t cares down below our cares, we set ourselves up in Tails and Epsilona sort of Cauchy distribution. Cauchy distributions have thicker tails than normals as shown in the normal on the right. In some sense, the tail thickness is set by moving the
x-axis of the normal down. Here we did that by some epsilon. In a marketing sense, that would be an upmarket move without renormalization. But, in our “don’t care” sense the don’t cares are defining the thickness of that epsilon.

With normal distribution shown on the right, we are defining our known as what we got from our sample, our soft of known as the space of the don’t cares, and our unknowns as the yet to be surveyed populations. The soft of knowns represent our tradeoffs. We had to choose a path through the subtree, so we had to ignore other paths through the subtree. There were 32 paths or 25 paths of the 211 paths. Keep in mind that the don’t cares don’t mean we don’t care. Don’t cares allow us to solve a problem with a more general approach, which we usually take to minimize costs. But, in the marketing sense, it’s more that we didn’t ask yet. Once we ask and get a firm determination, we firm up one path from the 32 possible paths. We can use don’t cares to move forward before we have a definitive answer.

But, the bias and variance figure tells us something else. It tells us where in the machine learning sense the ideal solution happens to be. It is at the minimum of a parabola. In the frequentist sense, that minimum defines a specific standard deviation, or in the approach to the normal sense, that minimum tells us where our sample has become normal. It also tells us where we have become insensitive to outliers.

Once we have found the minimum, we have to realize that minimum in the development or definitional effort. Agilists would stop when they reach that minimum. Would they realize that they reached it? That is another matter. Ask if they achieved normality or not. But, the goal of machine learning is to approximate a solution with limited data, or approximating the parabola with a limited number of points on the parabola. Once you’ve approximated the parabola, finding the minimum is a mathematical exercise.

We can represent the product as a line through that minimum. That line would represent the base of a decision tree. I’ve represented these decision trees as triangles. Those triangles being idealizations. A generative effort in a constraint space is much messier than a triangle would suggest.

I’ve annotated the bias and variance graph with such a line. I’ve used a straight line to Bias and Variance in ML 2represent the realization. Every realization has an ontology representing the conceptualization to be realized. Every realization also has a taxonomy, but only after the realization. It boils down to ontologies before and taxonomies after. In the figure, the line from the minimum error to the baseline of the bias and variance graph is the target of the development effort. The realization line was projected and redrawn to the right. Then, the ontology and the taxonomy were added. Here the ontology and the taxonomy are identical. That is far from reality. The ontology and the taxonomy are symmetrical here, again far from reality.

The figure below the one on the right shows a messier view of a realization to be achieved over muliple releases. The solid red line has been released. There is an overall taxonomy, the enterprise taxonomy. And, there is the taxonomy of the user. The user’s effort generates some value that is significant enough to warrant continued development of the intended realization shown as red striped line. The user’s taxonomy is limited to the user’s knowledge of the carried content. The user’s knowledge might need to be enhanced with some training on the underlying concept. The user may not know the underlying conceptual model defined in the ontology. The developers might not know the underlying conceptual model either.

We cannot feed an ontology to a neural network. And, that neural network won’t discover that ontology. When Google wrote that Go playing application, it discovered a way to play Go, that no humans would have discovered. There are more ways to get to a realization than through ontologies and taxonomies.

The value of a realization is achieved by projecting effort through the realization. That value is evaluated relative to a point of value. That value is evaluated by some valuation baseline. Different managers in an enterprise would have different views of the realization, and different valuation baselines.

The symmetries, asymmetries, and axes of those symmetries that I highlighted are significant indicators of what must be learned, and who must learn what is being taught. Value realization is tied to what must be taught. The need to teach like the need to design interfaces are signals that underlying ontology was not known to the users, and not known and subsequently learned by the developers. The need to teach and design shows up more in products designed for sale or external use.

So what is a product manager to do? Realize that the number of samples is much larger than what Cook’s formula tells us the minimum number of samples would be. Don’t cares are useful minimizations. There is one ontology and many taxonomies. Agile assumes that the ontology will be discovered by the developer. When the UI is not straightforward, the ontology has been departed from. And, there are many views of value and many valuation baselines.

Enjoy.

 

 

 

 

 

 

 

 

Point of Value

June 21, 2018

A few days ago,  tweeted a video where he was saying that it was all about value. We get the idea that the product is the value we are delivering, but that is a vendor specific view. What we are really doing to providing a tool with which the economic buyer purchases, so their people can use it to create value beyond the tool. I’ve called this concept projecting the value through the product. It is the business case, the competitive advantage derived from use that provides the economic buyer with value, not the product itself. This same business case can convince people in the early adopters two-degrees of separation network to buy the product moving it across the chasm if the underlying technology involves a chasm.

An XML editor provides no value just because it was installed. The earliest version of Gartner’s total cost of ownership framework classified that install as effort not expended doing work. They called it a negative use cost. The product has not been used yet. The product has not generated any value and yet cost were accumulating. Clearly, the XML editor did not provide the owner with any value yet.

Once a user tags a document with that XML editor and publishes that document, some value is obtained by someone. The user has a point of view relative to the issue of value. And, the recipient of that value has their own point of view on the value. When the recipient uses the information while writing another report, the value chain moves the point of view on the value again, and more value accumulates.

That led me to think in term of a value chain, the triangle model, and the projection of value. So I drew a quick diagram and redrew it several times.

In this first figure, the thickVP 03 black line of the diagram on the left is the product. Different departments use the product. The use of the product is focused, and the value is delivered at the peaks of those downward facing triangles. The value shown by the black triangles is used within the red triangles. The use inside the red triangles delivers value to the peaks of the red triangles. Notice there is a thick red line, labeled E. This represents the use of the underlying application by users outside the entities represented by the black triangles. that report to the red entity. The underlying application is doing different things for users in different roles and levels.

All this repeats for the purple entities and values, and the blue entities and values. Value is projected from the interface to a point of value through work. That delivered value is projected again to the next point of value. The projections through work continue to accumulate value as the points of value are traversed.

The diagram on the right, in the top figure, diagrammatically depicts the value chaining and points of value, shown in the diagram on the left. It should be clear that the value is created through work, work enabled by the product. The product is the carrier, and the work is the carried content. The work should be entirely that of the purchaser’s users.

VP 01I’ve always thought of product as being the commercialization of some bending or breaking of constraints. I stick with physical constraints. In the figure on the left, we start with the linear programming of some process. Research developed a way to break a constraint across some limited range that I’ve called an accessibility gate. Once we can pass through that gate, we can acquire the value tied up in the accessed area (light blue).

The effort to pass through that gate involved implementing five factors. Those factors are shown as orange triangles that represent five different deliverables. Each of these factors are different components of the software to be delivered. The order of delivery should increase the customer or client’s willingness to pay for the rest of the effort. Value has to be delivered to someone to achieve this increased willingness. Quickly delivering nothing gets us where? The thin purple curve order various point of value in a persuasive delivery order.

Some of the factors are not complete before they are being used and projecting some value. The projection of value is not strictly linear. The factor on the far left involves code exclusively but is the last of the factors to deliver value. For this factor, it takes three releases to deliver value to three points of value.

The other factors require use by the customer’s or client’s organization to project the desired value.

Further value is accomplished by entities remote from the product. This value is dependent on the value derived by the entities tied to the product. I’ve labeled these earlier entities as being independent. The distant projections of value are dependent on the earlier ones. It remains to be seen if any of it is independent.

The path symmetries tie into the notions of skew and kurtosis as well as projections as being subsets or crosscutting concerns. Organizational structure does not necessarily tell us about where the value accrues.

VP 02In the next figure, we take you from the user to the board member. The red rectangle represents the product. The thick black line indicates the work product projected from the user through the product. The thin red arrows represent the various changes in the points of value. The thin light blue lines show the view of the value.

At some point in the value chain, the value becomes a number and later a number in an annual report. The form of the underlying value will change depending on how a given point of value sees things. This is just as much an ethnographic process as requirements elicitation. These ethnographic processes involve implicit knowledge and the gaps associated with that implicit knowledge. Value projection is both explicit and implicit.

Enjoy.

Complex

June 11, 2018

Today, someone out on twitter mentioned how power users insist on the complex, while the ordinary users stick with the simple. No. It’s more complicated than that. And, these days there is on excuse for the complex.

Lately, I’ve been watching machine learning videos and going to geek meetups. One guy was talking about how machine learning is getting easier as if that was a good thing. And, he is a geek. Easier, simpler happens. And, as it does, the technology can’t generate the income it used to generate. Once the average user can do machine learning without geeks, what will the geeks do to earn a living? Well, not machine learning.

The technology adoption lifecycle is organized by the pragmatism of the managers buying the stuff and the complications and simplicities of the technology. The technology starts out complicated and gets simpler until it vanishes into the stack. It births a category when it’s discontinuous, aka a completely new world, and it kills the category once it has gotten as simple as it can be. The simpler it gets, the less money can be made, so soon enough everybody can do it, and nobody can make any money doing it. We add complications so we can make more money. Actually, we don’t. Things don’t work that way.

So I drew a technology adoption lifecycle (TALC) vertically. I’ve modified the place of the technical enthusiasts in the (TALC). They are a layer below the aggregating mean. They span the entire lifecycle. I left Moore’s technical enthusiasts at the front end of the vertical. And, I’ve extended the technical enthusiasts all the way out to the efforts prior to bibliographic maturity.

Complicated

I used the word “Complicated” rather than complex. Complicated is vertically at the top of the figure. Simpler is at the bottom. The left edge of the technical enthusiast slice of the normal is the leading edge of the domain where the complicated, the complex is encountered. The complex can be thought of like constraints. Once you simplify the complex there is more complexity to simplify. The vertical lines represent consecutive simplifications. Where there are many vertical lines, the complications are those of the people working on the carrier aspects of the complexity. I drew a horizontal line to separate the early and late phases. I did this to ghost the complexity grid. There is more than enough going on in the distribution itself. the vertical lines below that horizontal line are the complexity lines related to the TALC phases on the right side of the TALC, to the right of the mean, to the right of the peak. Or in this figure, instead of the usual left and right, think above and below.

In the diagram, I put “Simpler” above (to the right of) “Complicated.” This is then labeled “Simpler 1.” We are still in the lab. We are still inventing. This simplification represents the first task sublimation insisted on by the TALC. This task sublimation happens as we enter into the late mainstreet, consumer phase. Technical enthusiasts don’t need simpler. But, to move something out of the IT horizontal into broader use, it has to get simpler.

Simpler is like graph paper. “Simpler 1” is distant from the baseline and aligned with the TALC phases, although the diagram separates them for clarity, hopefully.

The device phase, aka the phase for the laggard population, absolutely requires technology that is far simpler than what we had when we moved the underlying technology into the consumer phase, late mainstreet. Devices are actually more complicated because the form factor changes and an additional carrier layer gets added to everything.  The orange rectangle on the left of the device phase is the telco geeks and their issues. The carried content gets rewritten for simpler UI standards. The tasks done on a device shouldn’t be the same as those done on a laptop or a desktop. The device phase presents us with many form factors. Each of those form factors can do things better than other form factors. But, again, the tasks done on each would be limited.

In Argentine tango, when you have a large space in which to dance, you can dance in the large. But, when the crowd shows up or the venue gets tiny, we tighten up the embrace and cut the large moves. Our form factor shrinks, so our dance changes.

How would basketball feel if it was played on a football field?

The cloud phase, aka the phase for the phobic population, requires technology that is totally hidden from them. They won’t administer, install, upgrade, or bother in the least. The carrier has to disappear. So again the UI/UX standards change.

The phase specificity of the TALC should tell us that each phase has its own UI standards. With every phase, the doing has to get simpler. The complexities are pushed off to the technical enthusiasts who have the job of making it all seem invisible to the phobics, or simple to the laggards, or somewhat simpler to consumers.

Task sublimations, simplifications, are essential to taking all the money off the table. If we get too simple too fast, we are leaving money on the table. When we skip the early phases of the TALC and jump into the consumer phase, we are leaving money on the table.

But, being continuous innovations, we don’t bother with creating value chains, and careers. They get the technical enthusiasts jobs for a few months. They get some cash. The VCs get their exit. It has to be simple enough for consumers. More simplifications to come. But, the flash in the pan will vanish. Continous innovations don’t put money on the table. That money is on the floor. Bend your knees when picking it up.

Technical enthusiasts should not cheer when the technology gets simplified. Maybe they need it to get simpler, so they can use it. But, it is going to continue to get simpler. And, real science in the pre-bibliographic maturity stage will be complex or complicated. It won’t get more complicated. It will get simpler. Simper happens.

That doesn’t mean that everything has to be in the same simplicity slice. It just means that the simplicity must match the population in the phase we sell into.

One complication that doesn’t show up in the diagram is that the TALC is about carrier except in bowling alley. In the bowling alley, the carried content is what the customer is buying. But, that carried content is a technology of its own, so the carrier TALC, and the carried TALC meet in the bowling ally. Each of those technologies gets simpler at their own rates. These intersections show up in late mainstreet when you want to capture more of the business from the vertical populations. This is a real option. But, it will take quite an effort to hold on to the domain knowledgeable people.

The diagram covers much more ground. Today, we just called out the complicated and the simple.

Enjoy!

Fourth Definition of Kurtosis

June 6, 2018

In the Wikipedia topic on Moment, Kurtosis being the fourth moment, aka forth integral of the moment generating function, Wikipedia says, “The fourth central moment is a measure of the heaviness [or lightness] of the tail of the distribution, compared to the normal distribution of the same variance.” Notice here, no mention of peakedness.

In Yes or No in the Core and Tails II, I included some discussion of mixture models with a two-dimensional graphic that illustrated the summing of two distributions. The sum (red) was Normals as Constraintssaid to have a heavy tail.  It was interesting to see distributions in a mixture model acting as constraints. I have not been able to confirm that normals in other sums act as constraints. In a mixture model, the weights of the summed normals must add up to 1, so one normal has a weight of p, and the other would have a weight of 1-p. The yellow areas represent the constrained space. The red distribution is sandwiched between the green one and the blue one. The green normal and the blue normal are constraining the red normal.

In analysis, distribution theory is not about statistics, but rather as substitutes for functions. In linear programming, constraints are functions, so it should be of no surprise that distributions act as constraints. Statistics is full of functions like the moment function. Every time you turn around there is a new function describing the distribution. Those functions serve particular purposes.

Another view of the same underlying graph shows these normals to be events on a timeline, the normal timeline. Statistics lives in fear of p-hacking, or waiting around and continuing to collect data until statistical significance is achieved. But, what if you are not doing science. P-hacking wouldn’t pay if the people doing it were trying to make some money selling product, rather than capturing grant money. Statistics takes a batch approach to frequentist statistical inference. Everything is about data sets, aka batches of data, rather than data. But, if we could move from batch to interactive, well, that would be p-hacking. If I’m putting millions on a hypothesis, I won’t be p-hacking. If I’m putting millions on a hypothesis, I won’t use a kurtotic or skewed distribution that will disappear in just a few more data points or the next dataset. That would just be money to lose.

So what is a normal timeline? When n is low, shown by the green line in the figure, labeled A, the normal is tall, skinny ideally, ideallyNormals as Timeline because it is also skewed and kurtotic which is not shown in this figure. We’ll ignore the skew and kurtosis for the moment. When n is finally high enough to be normal, shown by the red line, it is no longer tall, and not yet short. It is a standard normal. When n is higher, shown by the blue line, labeled B, the distribution is shorter and wider. So we’ve walked a Markov chain around the event of achieving normality and exceeding it. This illustrates a differential normality.

We achieve normality, then we exceed it. This is the stuff of differentials. I’ve talked about the differential geometry previously. We start out with Poisson games on the technology adoption lifecycle. These have us in a hyperbolic geometry. We pretend we are always in a Euclidean space because that is mathematically easy. But, we really are not achieving the Euclidean until our data achieves normality. Once we achieve normality, we don’t want to leave the Euclidean space, but even if we don’t, the world does, our business does. Once the sigma goes up, we find ourselves in a spherical geometry. How can so many businesses exist that sell the same given commodity in a multiplicity of ways? That’s the nature of the geodesic, the metric of spherical geometry. In a Euclidean space, there is one optimal way; in hyperbolic, less than one optimal way; and spherical, many. This is the differential geometry that ties itself to the differential normality. The differential normality that batch statistics, datasets hide. A standing question for me is whether we depart the Euclidean at one sigma or six sigma. I don’t know yet.

As a side note on mixture models like the underlying figure for the figures above, these figures show us normals that have a mean of zero, but their standard deviations differ. Sum of Normals - Different Std DevsThe first standard deviation is at the inflection point on each side of the normal distribution. The underlying figure is tricky because you would think, that all three normals intersect at the same inflection point. That might be true if all three had the same standard deviation. Since that is not the case, the inflection points will be in different places. The figure shows the inflection points on one side of the normal. When the distribution is not skewed, the inflection points on the other side of the mean are mirror images.

Mixture models can involve different distributions, not just normals. Summing is likewise not restricted to distributions having the same mean and standard deviations or being of the same kind of distributions.

Multivariable normals contain data from numerous dimensions. A single measure is tied to a single dimension. A function maps a measurement in a single dimension into another measurement in another dimension. Each variable in a multivariable normal brings its own measure, dimension, and distribution to the party. That multivariable normal sums each of those normals. Back in my statistics classes, adding normals required that they have the same mean and same standard deviation. That was long ago, longer than I think.

Enjoy.