From Conspiracy to Receipts

Yes, the title sounds like a conspiracy theory.

But let me say in plain words what they refuse to admit: they are trying to reach the moon by building a taller ladder, scaling flat transformers, throwing hundreds of billions of dollars a year in brute-force computation at a problem the architecture cannot solve at any cost.

Meanwhile they hide the blueprint they are actually working on, the rocket ship that is Geometric AI, which would save trillions by handing out the whole map instead of one street address at a time.

Of course they are telling you the opposite story. The core of it is much simpler than the trillion-dollar price tag suggests: at the heart of current AI sits a geometrical problem, and no GPU farm in the world is going to fix it, not even with the nuclear-powered data center the size of Patagonia they have started building.

Stay with me. Because you will see how this apparently crazy absurdity has a point of sanity hidden inside the labs themselves: their research teams are following a different approach to the sola computatione of the current AI church.

The smoking gun is neither a secret server farm nor a leaked memo. And least of all, it is not some cinematic superintelligence hidden behind a locked door. It is something much colder and more practical.

The back door is inside the technical and scientific papers of the companies themselves, which, except for some expert nerds, almost nobody outside the field reads.

Do you want to find out whether NVIDIA, Google, Anthropic, and the rest are selling the opposite of what they are intensively researching? Then don't read their press releases, their stage-managed public events, or the effusive declarations of their CEOs. Do something far simpler: grab the research papers of their fringe teams, and you will see where this story begins and how it drives into a new territory of AI that should set off alarms for you.

Don't worry. I have done the homework for you. Here we go.

We Should Already Have a Truly Non-Parroting AI

But first, let me be precise about what is under attack in current AI.

Ok, we have already mentioned transformers. Sure, they have positional encoding. They have attention. They mix each token with the tokens around it across many layers of computation. They are not blind to context in the trivial sense; anyone arguing that they are has not opened the 2017 paper.

The legitimate attack is not that AI is blind.

The legitimate attack is that the current architecture handles context in a structurally wrong way, and scaling computation cannot truly fix it.

Attention layered on top of flat embedding spaces does not, by itself, measure the semantic distance between concepts that overlap and shift with context.

Take justice and law as a worked example.

Why Flat AI Cannot Measure Semantic Distance For petty crimes (top), justice and law sit at the same distance in democratic USA and theocratic Iran. For political crimes (bottom), the distance diverges sharply: in the USA, criticizing the government is protected speech, so the concepts stay close; in Iran, criticism is criminalized, so the concepts split apart. Most concepts behave this way: overlapping in some contexts, separating in others. They are not linearly separable. Flat AI has no arch
Why Flat AI Cannot Measure Semantic Distance For petty crimes (top), justice and law sit at the same distance in democratic USA and theocratic Iran. For political crimes (bottom), the distance diverges sharply: in the USA, criticizing the government is protected speech, so the concepts stay close; in Iran, criticism is criminalized, so the concepts split apart. Most concepts behave this way: overlapping in some contexts, separating in others. They are not linearly separable. Flat AI has no architectural slot for the regime to enter the computation, so it collapses the relation into one statistically averaged distance. That average is not right in any specific context. Flat AI computes the straight chord correctly inside its own flat embedding, but cannot compute the curved geodesic on the semantic manifold. Like a Mercator map, the chord is valid on the map but says nothing about distances on the curved territory. Flat AI is the Mercator AI of semantic computation.

The distance between these two concepts in the democratic United States and in the theocracy of Iran may look almost identical when you are talking about petty crimes and ordinary civil offenses. But the moment you move to political crimes, the two concepts can sit at opposite ends of the semantic space.

Same two concepts, yeah, but…

Different distance.

Different context.

That is the point.

Meaning is not just a word sitting near another word. Meaning is a position inside a changing semantic terrain. The distance between concepts depends on the background against which they are read.

Flat AI embeddings cannot do this for free. They force the model to fake it by feeding it enough examples to memorize how the distance shifts: different regimes, different offenses, different legal traditions, endless combinations. Brute statistical memorization where geometry should be doing the work.

It is the flat-map problem one layer deeper. You cannot measure curved distances with a straight ruler, no matter how many rulers you stack. Context is not just another word the model processes. It is the ground the words sit on, the surface that decides what the distances between them mean. Flat AI keeps trying to do computation where it should be doing geometry, and no amount of training data can fix that.

It is like trying to measure speed with a stopwatch alone, ignoring distance. A stopwatch gives you time. A speedometer combines time and distance into one quantity — speed — that neither alone can give. Flat AI is the stopwatch (a functor: one input, one direction). Real cognition needs the speedometer (a profunctor: two inputs of different kinds, one a concept, the other its context).

Flat AI is a functor pretending to be a profunctor: one input slot, one direction, no place for context, and brute-force statistics frantically trying to make up for the missing dimension. Real cognition is the profunctor — two inputs of different kinds, one a concept, the other its context — that flat AI cannot become no matter how much compute you throw at it.
Flat AI is a functor pretending to be a profunctor: one input slot, one direction, no place for context, and brute-force statistics frantically trying to make up for the missing dimension. Real cognition is the profunctor — two inputs of different kinds, one a concept, the other its context — that flat AI cannot become no matter how much compute you throw at it.

Their own papers are the smoking gun

So now you know: current commercial AI is pretending to use a functor like a profunctor, and a stopwatch like a speedometer. But have the labs themselves fallen into that trap? Of course not, as you can find in their own technical papers telling you exactly the opposite.

I know, you like me could have been confused by the marketing pages telling you that new flavors of transformers are still the future, and by the CEOs out there to sell everyone the idea that burning more tons of money on ludicrous raw computation will fix it.

So let us see how the research that never makes the press headlines ranks every AI company in the real race, the one going on beyond raw computation and the next flavor of transformer. Read the table below as an inventory, not an opinion. Everything in it is what the labs have published in their own journals, under their own names, dated and signed, and every claim can be checked against the papers they themselves released in the past twelve months. So let me read the evidence aloud to you.

The AI Companies Ranked by What They Know Versus What They Sell The A tier builds geometric AI in the open because its customers pay for the geometry. The B tier — Anthropic and Google DeepMind — publishes the same mathematics in its own journals while shipping flat transformers under product names. The C tier, NVIDIA, sells the picks and shovels of the wrong race. The D and E tiers ship flat architectures without the math to know it. The typography itself encodes the descent: mathematical trut
The AI Companies Ranked by What They Know Versus What They Sell The A tier builds geometric AI in the open because its customers pay for the geometry. The B tier — Anthropic and Google DeepMind — publishes the same mathematics in its own journals while shipping flat transformers under product names. The C tier, NVIDIA, sells the picks and shovels of the wrong race. The D and E tiers ship flat architectures without the math to know it. The typography itself encodes the descent: mathematical truth at A, editorial gravity at B, industrial documentation at C, corporate press-release voice at D, performative script at E. For investors, A tier is the asymmetric-upside bet on architectural correctness: early, powerful, but vulnerable to execution. B tier is the safe-conviction long: Anthropic and Google have the math, the capital, and the talent density to pivot when the geometric moment arrives. C tier is the structural paradox: tactical long while flat scaling continues, structural short the day the industry pivots. D and E are the squeezed middle, with neither architectural advantage nor capital depth. The barbell is A for upside, B for downside protection.

The gun is warm. Anthropic's research team has known for months that they have to make explicit what some of their models are already doing accidentally. Their paper When Models Manipulate Manifolds shows exactly that: one of their own AI models, despite being trained in the opposite direction, performs its computation by twisting curved manifolds. The conclusion is clear. The AI itself is begging them to turn the flat space into the curved geometry it needs, not just for one specific task but for solving most of the problems flat AI cannot solve.

Anthropic Found the Curve They just have not put it in the architecture yet. Carry a vector around a closed loop. In flat space, it returns unchanged, which is another way of saying that order does not matter in flat space. In curved space, it returns rotated, and the angle of rotation is the holonomy. That rotation is the signature of meaning: dog bit man and man bit dog land at different positions on the curved manifold of cognition because the order of the loop changed, and the geometry reme
Anthropic Found the Curve They just have not put it in the architecture yet. Carry a vector around a closed loop. In flat space, it returns unchanged, which is another way of saying that order does not matter in flat space. In curved space, it returns rotated, and the angle of rotation is the holonomy. That rotation is the signature of meaning: dog bit man and man bit dog land at different positions on the curved manifold of cognition because the order of the loop changed, and the geometry remembered.

The gun was in the drawer, and they knew. In a quieter update published months before the manifold paper mentioned above, Anthropic admitted the embarrassing thing in their own words: the mathematics they had been using to describe what their models were doing was the wrong mathematics, and the right one was about features and geometric structure, not the algebraic flat-space stuff they had been shipping to the public.

Translation: while their CEOs sell scaling and the marketing pages sell the next flavor of transformer, their own research team is telling us that the architecture they are selling is doing the right thing despite being trained on the wrong mathematics. The official story is flat AI getting better. The research story is that the architecture is screaming for the geometry they have not given it, and until they give it, the model will be partially right on simple tasks and wrong when the tasks become complex and require non-flat geometry.

Ballistics. Late last year, a paper called The Curved Spacetime of Transformer Architectures dropped, co-authored by scientists with previous stops at Google, Apple, and Cohere. Yes, the same Cohere that builds production LLMs against Anthropic and OpenAI.

The paper's pitch? Einstein, but for transformers. General relativity as the analogy, not as the metaphor. Queries and keys induce a metric on representation space. Attention is a discrete connection doing parallel transport of value vectors across tokens. Stacked layers trace evolution through a curved manifold. Read that again, because that is, word for word, the geometrical frame for the new AI we are betting on in this article: parallel transport. A semantic manifold whose curvature is induced by the attention mechanism itself. Not a thought experiment. They are naming the exact architecture flat AI does not have and the production stack does not implement.

Finally the showdown begins. Earlier last year, a different group of researchers at Yale stopped waiting. They did not theorize, did not propose, did not gesture toward future work. They opened up the decoder-only language models the labs are currently shipping and measured the curvature directly. Specifically, they computed Ricci curvature distributions on the embedding spaces of Llama 2 (Meta), Gemma 2 (Google DeepMind), and DeepSeek-MoE (DeepSeek), and found substantial negative curvature in all three, implying higher local hyperbolicity in the embeddings the flat architecture is supposedly processing as flat.

Pause on that for a second. The funny part is not the math. The funny part is that the LLMs are mutinying against their own designers. Meta's engineers programmed Llama to live in flat Euclidean space. Llama is, on its own, organizing its embeddings into a curved hyperbolic geometry. Google's engineers programmed Gemma to live in flat Euclidean space. Gemma is doing the same. DeepSeek's engineers programmed DeepSeek-MoE to live in flat Euclidean space, and guess what? DeepSeek-MoE is rebelling and fabricating its own curved space.

The architecture the labs sell as flat is not flat, has never been flat, and the models themselves are the ones telling us that. The flat-AI engineers are losing an argument with their own software. The Yale team then took the obvious next step and built a fully hyperbolic LLM that outperformed Meta's LLaMA and DeepSeek's MoE on MMLU, ARC, and four other reasoning benchmarks. The bluff has been called.

And here is what the bluff being called actually means: hallucination is not a software bug the labs are about to patch. It is a structural consequence of running cognition on a flat space when the cognition itself requires curvature.

The geometry the AI is growing on its own is the geometry the AI needs. Until the architecture is rebuilt around that geometry instead of in spite of it, the model will keep being partially right on simple tasks and hallucinating exactly where the geometry actually matters. Multiple papers, ours among them, have been pointing at this for years. The working consensus of geometric deep learning is co-authored by Google DeepMind's Petar Veličković, and NVIDIA Research has been publishing equivariant and topological architectures with Michael Bronstein himself.

There you go, the labs are publishing the architecture they refuse to ship. The Yale paper is the empirical proof that the path forward is not more GPUs. The path forward is the curved architecture the models have been begging for. The architecture race the frontier labs thought they had won by capex alone has a new front, and that front is geometric.

Cognition, Computation, and Geometry as One Architecture Three composable layers of a single machine. Three parameters an architect can dial. One mathematical fact lifts through all three: curvature at the geometry layer becomes a non-commutative bracket at the computation layer and appears at the cognition layer as what ordinary language calls order. Dog bites man is not man bites dog. Flat AI sets the geometry parameter to zero, and the whole composition collapses. This is the architecture AI
Cognition, Computation, and Geometry as One Architecture Three composable layers of a single machine. Three parameters an architect can dial. One mathematical fact lifts through all three: curvature at the geometry layer becomes a non-commutative bracket at the computation layer and appears at the cognition layer as what ordinary language calls order. Dog bites man is not man bites dog. Flat AI sets the geometry parameter to zero, and the whole composition collapses. This is the architecture AI companies have already discovered accidentally inside their own production models, and it is the architecture the next AI will be designed around.

And Now, the Case Rests

Look at where we are. An accelerating stream of mindblowing research evidence over the last few years, every piece of it landing on the same architectural diagnosis. The interpretability teams know the geometry is curved.

Everybody at this party knows how it is going to end, but they have chosen not to tell the guests until the last moment, because they are still selling tickets at the door. The research teams know the architecture is the wrong shape for the job. The CEOs know the geometry teams know, because the geometry teams publish on the company servers, under the company names, with the company funding the buildings they work in. Everybody in the lab knows, and the product still ships flat. The show must go on until all the investments are returned in profits.

So what does that mean for you. It means hallucination is not the bug they will quietly patch in the next release. It means the next five trillion dollars of capex on flat transformers will not buy you out of the hallucination problem, because the hallucination problem is geometric and you cannot scale your way around geometry the same way you cannot scale your way out of the laws of motion. It means the labs that figure out how to build the curved architecture before the flat one runs out of runway will own the next decade, and the labs that keep selling flat transformers will spend the next decade explaining to their boards why the curve they could have shipped is now shipping under someone else's logo.

The good news— and there is, indeed, good news — is that the curve is not hiding. The math is open. The geometry is documented in papers anyone with a browser can read. The Yale team built a hyperbolic LLM at a billion parameters on a university budget and beat the production stack. The Cohere ex-engineer wrote the Einstein analogy and dropped it on arXiv. Anthropic's own interpretability group has been doing the structural reverse-engineering in the open for years.

The architecture of the next AI is being built in plain sight by people whose only resource is curiosity and a willingness to take the geometry seriously, and that means anyone with the same two resources can join the build. The case rests on the labs. The future is open to everyone else.

Cognition, Computation, and Geometry as One Architecture Three composable layers of a single machine. Three parameters an architect can dial. One mathematical fact lifts through all three: curvature at the geometry layer becomes a non-commutative bracket at the computation layer and appears at the cognition layer as what ordinary language calls order. Dog bites man is not man bites dog. Flat AI sets the geometry parameter to zero, and the whole composition collapses. This is the architecture AI
The new thinking machine in one diagram. A principal SU(N) bundle over the p-adic solenoid, viewed three ways: as a profunctor type-signature in the category-theoretic language, as a curved manifold with non-zero holonomy in the geometric language, as a layered cascade lifting from geometry through computation into cognition in the engineering language. Three different mathematical vocabularies, one architectural object, all of them describing the same machine.

Summing up. The math has been done. The implementation is feasible. The smoking gun is on their own publication sites. The only thing missing is the will to pick it up and admit what it is. The smoking gun will become the submachine gun. The best-placed party for that transition is the increasingly close Google-Anthropic partnership.