Archive for August, 2019


August 28, 2019

Twitter brought it up again, n-dimensional packing, with a link to An Adventure in the Nth Dimension in American Scientist. An earlier article in Quanta Magazine, Sphere Packing Solved in Higher Dimensions, kept the problem in mind.

So why would a product manager care? Do we mind our dimensions? Do we know our parameters? Do we notice that the core of our normal distribution is empty? Are our clusters spherical or elliptical? Do we wait around for big data to show up while driving our developers to ship?

I replied to a comment in the first article. The article never touched on the fact that pi is not a constant. The pi is a constant assertion that lives in L2. L2 is a particular Lp space where p=2. L2 is our familiar Euclidean space. When we assert a random variable we are in L0. Our first dimension puts us in L1; our second, L2; our third, L3; and so forth.

I’ve been drawing circles as the footprint of my normal distributions. Unless I specifically meant a two dimensional normal, they should have been squircles. A circle is a squared-off circle or a square with circular corners.

The red object is a squircle in L4. That is the fourth dimension. The n here refers to the dimension.

The blue object is a circle in L2. We could also consider it to be a squircle in L2.

If they are both footprints of normal distributions, then the blue distribution would be a subset of the red distribution. Both have enough data points separately to have achieved normality. Otherwise, they would be skewed and elliptical.

The L2 squircle might be centered elsewhere and it might be an independent subset of the superset L4. That would require independent markers that I discussed in the last post. Independence implies an absence of correlation. There is no reason to assume that the footprints of independent subsets share the same base plane.

The reason I added a circle to the diagram of the L4 squircle was to demonstrate that the circumference of the L4 squircle is larger than that of the L2 squircle, aka the circle. That, given that π is defined at the ratio of the circumference to the diameter, π = C/d = C/2d, and that implies that every Lp space has a unique value for π. This was not discussed in the article that led to this blog post. It turns out that dimension n parameterizes the shape of the footprint of the normal distribution.

The dimension n would differ from supersets and subsets. Each dimension achieves normality on its own. Don’t assume normality. Know which tail is which if the dimension is not yet normal. Every dimension has two tails until normality is achieved. This implies that the aggregate normal that has not achieved normality in every dimension is not symmetric.

Lp spaces are weird. When the dimension is not an integer, that space is fractal.

The normal distribution has a core, a shoulder, and a tail. Kurtosis is about shoulders and tails. This is a relatively new view of the purpose of kurtosis. More importantly, the core is empty. The mean might be a real number when the data is integers. The mean is a statistic, not necessarily data.

When we talk about spheres, the high-dimensional sphere is empty. As the dimension increases, the probability mass migrates to the corners, which become spikes in the high-dimensional sphere. There is some math describing that migration. The spikes are like absolute values in that they are not continuous. There is no smooth surface covering the sphere. It’s one point to another, one tip of the spike to the next. You have to jump/leap from one to the next. Do we see this with real customers? Or, real requirements.

Sphere packing with spikey spheres means that we can compress the space since the spikes interleave. In our jumping from one spike to the next and from one sphere to another, how will that make sense to a user?

This graph from the American Scientist article is the historical flight envelope of sphere packing. Apparently, nobody had gone beyond 20 integer dimensions. The spheres look smooth as well.

I took statistics decades ago. Statistics was a hot topic back then. Much work was being done then. I’m surprised by the parameterizations that happened since then. Lp space is indexed by n, the number of dimensions, a parameter. Things that we think of as constants have become parameters.

Parameters are axes, aka dimensions. Instead of waiting until your data pushes your distribution hits a particular parameter value, you can set the parameter, generate the distribution and explore your inferential environment under that parametric circumstance. The architect Peter Eisenman used generative design. He did this by specifying the parameters or rules and observing his CAD system animate a building defined by those parameters and rules. Similarly, you can check your strategies in the same way–long before you have the data or the illuminators that lead to that data.

Much of the phase changes that we call the technology adoption lifecycle involves independent markers or data that never got into our data. It is all too easy to Agile into code for a population that we shouldn’t be serving yet. The mantras about easy fail to see that easy might have us serving the wrong population. The cloud is the easiest. It is for phobics. It is not early mainstreet. Our data won’t tell us. We were not looking for it. This happens given big data or not.

The more we know, the less we knew. We didn’t know π was a parameter.


Independent Markers

August 11, 2019

Well, as usual, Twitter peeps posted something, so I dived in and discovered something I’ve barely had time to dive into. Antonio Gutierrez posted Geometry Problem 1443. Take a look.

It is a problem about the area of the large triangle and the three triangles comprising the larger triangle.

A triangle has an orthocenter and an incenter. A triangle has many centers. The orthocenter is the center of the circle around the large triangle. I’ve labelled that circle as the superset. The incenter is the center of the circle inside the large triangle. That circle is the subset.

It doesn’t look like a statistics problem, but when I saw that symbol for perpendicularity implying that the subset is independent of the superset. It quickly became a statistics problem and a product marketing problem.

The line AB is not a diameter, so the angle ACB is not a right angle . If AB were a diameter, angle ACB would be a right angle. The purple lines run through the orthocenter, the center of the circle representing the superset, which implies that the purple lines are diameters. I drew them because I was thinking about the triangle model where triangles are proofs. And, I checked it against my game-theoretic model of generative games. The line AB is not distant from the middle diameter line. This is enough to say that the two thin red lines might converge at a distant point. As the line is moved further from the diameter, the lines will converge sooner. Generally, constraints will bring about the convergence as the large triangle is a development effort and the point C is the anchor of the generation of the generative space. The generative effort’s solution is the line AB. The generative effort does not move the populations of the subset or superset.

O is the orthocenter of the larger triangle. A line from O to A is the radius of the large circle representing the superset. I is the incenter of the large triangle. A line from I to D is the radius of the small circle representing the independent subset.

Now for a more statistical view.

When I googled independent subsets, most of the answers said no. But I found, a book, New Frontiers in Graph Theory edited by Yagang Zhang that discussed how the subset could be independent. I have not read it fully yet. but the discussion centers around something called markers. The superset is a multidimensional normal. The subset is likewise but the subset contains markers, these being additional dimensions not included in the superset. That adjust a distribution’s x-axis relative to the y-axis, something you’ve seen if you read my later posts on black swans. And, this x-axis vertical shift or movement of the distribution’s base is also what happens with Christensen disruptions, aka down market moves. In both black swans and Christensen disruptions, the distribution’s convergences with the x-axis move inward or outward.

In the above figure, we have projected from the view from above to a view from the side. The red distribution (with the gray base), the distribution of the subset, is the one that includes the markers. The markers are below the base of the superset. The markers are how the subset obtains its independence. The dimensions of the marker are not included in the superset’s multinomial distribution. The dimension axes for the markers are not on the same plane as those of superset.

Now, keep in mind that I did not yet get to read the book on these markers and independent subsets. But, this is my solution. I see dimensions as axes related by an ontological tree. Those markers would be ontons in that tree. Once realized, ontons become taxons in another tree, a taxonomic tree.

Survey’s live long lives. We add questions. Each question could be addressing a new taxon, a new branch in the tree that is the survey. We delete questions. Data enters and leaves the distribution, or in the case of markers disappear below the plane of the distribution.

Problems of discriminatory biases embedded in machine learning models can be addressed by markers. Generative adversarial networks are machine learning models that use additional data to grade the original machine learning model. We can call those data markers.

I am troubled by perpendicularity implying independence. The x-axis and the y-axis are perpendicular until you work with bases in linear algebra. But, the symbol for perpendicularity did not lead me down a rabbit hole.