Posts Tagged ‘tori’

Miscellaneous, 1/26/2021

January 27, 2021

Central Limit Theory

I watched some YouTube videos on the central limit theory in which, according to that theory, a population can be sampled with samples of size 30. The presenter implied that you had as many such samples as needed to cover the population. But, the point was that each sample would have 30 entities in it.

I don’t know, but 30 seems too small. It is nowhere near 211. Even 211 is too small to get us to a symmetric normal, one without skew and kurtosis.

I drew a picture with my blunt tools. I didn’t use a sphere packing algorithm. I just drew a multiplication table. A surprise jumped out.

A normal has a circular footprint. A normal sits inside a square. So what shape are we talking about here? How do we get to 30? 25 or 25 is too small, and 62 or 36 is too large. We are talking people here, not say populations of the square root of 30.

The red lines are ellipses sitting inside rectangles. They are not normals yet. They are pre‑normals or long‑post‑normals. They are either hyperbolic or spherical. And, somehow, according to the central limit theory, when added together, they add up to a standard normal. That implies that their mean is 0 and their standard deviation is 1.

A circle implies the absence of a correlation. A rectangle implies the presence of a correlation, or a bias.

Notice that the samples for 1 and 36 are outside the circle. They are omitted from the population. Oh well.

Synthetic Data

Mashhood Ahmed’s discussion of synthetic data came up again out on LinkedIn. See the discussion.

Project management has long been a research topic. Software engineering research is similar. The data justifying and validating the practices have existed for a long time. If anybody went looking for data, it exists. Yes, maybe your way is different. But, getting consistent data for yourself and your organization should be simple. Capture it. Analyze it. Integrate it.

Once you know the parameters and the constraint envelopes, you can generate synthetic data on those parameters and constraints. You can run a real project using those synthetic parameters and constraints and then see what your organization delivers. Capturing your outcomes lets you forecast where the parameters and constraints will take you before you go.

When I talk about the technology adoption lifecycle, I know what my distributions will look like before I have any customers. I know what the processes are going to be. I know the evolution. My current obsession with regressions to the tail is just a matter of knowing what I can expect and knowing what to do about it when I see it coming. This business of financial turbulence bleeding into adjacent processes and dependent processes is trouble. How do you put that in a box? How do you deal with the coupling and cohesion of that turbulent system? How do you make it an object?

I built my long tail representing my application as I build my application. Use tells me about my tails. Does the tail match up with the requested requirements? The resulting tails validate the survey data that led to the requirements. The resulting tails confirm marketing organizations’ delivery of the appropriate users. With a user interface, there is plenty of ongoing data collection. Call them surveys if you like.

In too many correlation classes, a given correlation is arbitrarily thrown away. It is actually a component of a tail. There are many tails. And, I’ve seen one system of correlations get replaced by another, assuming that there is only one tail. When a UI control asks you to select one of three possible choices, there are three tails. Or, is that question some pointless data to be stored? Is that choice eventually expressed by some component of the system? Three choices give you three different probabilities, and three departing Markov chains of probabilities to add to the predecessor’s tail, assuming only one, led us to that control. In AI, the overall UI would have been a small world.

Knowing your expected distributions and putting synthetic data in them should not be a problem.

Open Source Software

Today, I came across a job description. They wanted a product manager for a product that aims to replace Dreamweaver. The product was written for programmers. We used Dremweaver. Were we programmers? The product is open source software.

Open-source software development is supposed to deliver better software than other development processes. It does this because the programmer is a member of the user community. That programmer knows the carried content. Most programmers know the carrier but have to be taught the carried content. These latter programmers are not users. Those two types of programmers present us with very different propositions.

Hell, I remember an organization that produced carrier type products. That company defined the world. They wanted to become a product company. That meant listening to the outside world, listening to users and others that defined their world, their carried content. In the end, they could not make that leap.

Barcodes and Persistent Homology

In my YouTube watching, I revisited bar codes. I got it this time. Start with a collection of points. Each of those points is the center of a circle of a given radius. All the circles are the same size. Increase the radius of all those circles. The circles begin to overlap at some radius. Continue to increase the radii. Some space gets surrounded. That surrounded space is a hole. When that happens, a hole is born. Continuing to increase those radii, the surrounded space, the hole disappears. That demarks the death of the hole. That happens at some radius.

The barcode starts at the radius when the hole is born. The barcode ends at the radius when the hole dies. That hole is exhibiting a lifecycle. That hole is a topological hole, not an algebraic hole. Tori and cyclide are topological structures that have holes. They show up in the curvatures of the tails around a normal over the lifecycle of that normal. Barcodes tell us how large the holes in those topological structures happens to be.

In the figure above, on the left, I put in a point cloud of blue points. These points are in a multidimensional space. I drew my first circles of radius r1. I cheated and moved the points, so they enclosed that big red space, our hole. The blue points generated a deformed torus. On the right, I drew a circle six points larger. That is radius r2. I overlaid those circles on the earlier ones. I failed to cover the hole. You can still see a red area in the center. Radius r2 needs to be one point larger. I added that point as shown. I did test the figure on the left, but my tools subtracted two points instead of one.

If the points came from survey data for a particular requirement, the barcode would show you the requirement’s lifecycle. In the figure on the left, removing point A representing a customer or user would prevent that hole’s birth.

If you could find the rate involved in the process of moving from r1 to r2, you could put a date on the birth and death of that requirement. The radius r1 tells you how much time you have to deliver that requirement from the start of the demand in your survey data.

Enjoy!

Poincaré Disk

September 13, 2020

The Poincaré is one model of hyperbolic space. Try it out here.

Infinity is at the edge of the Poincaré disk, aka the circle. The Poincaré is a three more dimensional bowl seen from above. Getting where you want to go requires traveling along a hyperbolic geodesic. And, projecting a future will understate your financial outcomes. Discontinuous innovation happens here.

A long time ago, I installed a copy of Hyperbolic Pool. I played it once or twice. Download it here. My copy is installed. It say go there. Alas, it did not work when I tested it from this post. My apologies. Hyperbolic space was a frustrating place to shoot some pool.

I’ve done some research. More to do.

A few things surprised me today. The Wikipedia entry for Gaussian Curvature has a diagram of a torus. The inner surfaces of the holes exhibit negative curvature. The outer surfaces of the torus exhibits positive curvature. That was new to me.

I’ve blogged on tori and cyclides in the context of long and short tails of pre-normal, normal distributions, aka skewed kurtotic normal distributions that exit before normality is achieved. These happen while the mean, median, and mode have not converged. I’ve claimed that the space where this happens is hyperbolic from the random variable’s birth after the Dirac function that happens when random variable is asserted into existence and continues until the distribution becomes normal.

Here are the site search results for

There will be some redundancy across those search results. In these search results, you will find that I used the term spherical space. I now us the term elliptical space instead.

We don’t ever see hyperbolic space. We insist that we can achieve normality in a few data points. It takes more than 211 data points to achieve normality. We believe the data is in Euclidean “ambient” space. We do linear algebra in that ambient space, not in hyperbolic space. Alas, the data is not in ambient space. The space changes. Euclidean space is fleeting: waiting at n-1, arrival at n, departing at n+1, but computationally convenient. Maybe you’ll take a vacation, so the data collection stalls, and until you get back, your distribution will be stuck in hyperbolic space waiting, waiting, waiting to achieve actual normality.

Statistics insists on the standard normal. We assert it. Then, we use the assertion to prove the assertion.

Machine learning, being built on neurons and neural nets, insists on the ambient space because Euclidean space is all their neurons and neural nets know. Euclidean space is convenient. Curvature in machine learning is all kinds of inconvenient. Getting funded is not just a convenience. It might be the wrong thing to do, but we do much wrong these days. Restate your financials, so the numbers for the future, from elliptical space paint a richer future than the hyperbolic numbers that your accounting system just gave you.

And one more picture. This from a n-dimensional normal, a collection of Hopf Fibered Linked Tori. Fibered, I get, but I stayed out of it so far. Linked happens, but I’ve yet to read all about it.

The thin torus in the center of the figure results from a standard normal in Euclidean space. Its distribution is symmetrical. Both of its tails are on the same dimensional axis of the distribution. They have the same curvature. The rest of the dimensions have a short tail and a long tail. Curvature is the reciprocal of the radius. The fatter portion of the cyclides represent the long tails. Long tails have the lowest curvatures. The thinner portion of the cyclides represent the short tails. Short tails have the highest curvatures. Every dimension has two tails in the we can only visualize in 2-D sense. These tori and cyclides are defined by their tails.

Keep in mind that the holes of the tori and cyclides are the cores of the normals. The cores are not dense with data. Statistical inference is about tails. And, regression to tails are about tails, but in the post-Euclidean, elliptical space, n+m+1 data point sense. One characteristic of the regression to tails, aka thick-tailed distributions, is that their cores are much more dense than that of the standard normal.

Hyperbolic space will only show up on your plate if you are building a market for a discontinuous innovation. Almost none of you do discontinuous innovation, but even continuous innovation involves elliptical space, rather than the ambient Euclidean space, or the space of machine learning. We pretend is that Euclidean space is our actionable reality. Even with continuous innovation, the geometry of that space matters.

Enjoy!