Sidekick AI Orchestration
Sidekick AI Orchestration
All posts
AI Strategy

Mythos Is Already Obsolete

Chase Bernier·April 2026·7 min read

Mythos launched a few days ago as the most capable AI model ever built. In eighteen months it will be a baseline. In three years it will look like GPT-4 does today.

That is not hyperbole. It is the continuation of a trend you can already plot. Once you see it clearly, every AI decision you make in the next eighteen months looks different.

Two Curves, Not One

AI progress is two curves running at the same time. The chips get bigger, and the models get smarter on the chips that already exist. Most coverage tracks one or the other. The interesting part is what happens when you stack them.

They compound multiplicatively. A 4x hardware gain on top of a 3x algorithmic gain is not 7x. It is 12x. At the scale these numbers run today, that distinction is the difference between “another product cycle” and “phase transition.”

1x2x4x8x16x202320242025202620272028GPT-4Opus 4MythosT2T3T4Algorithm (USAMO score vs GPT-4)Silicon (compute vs Trainium2)
Both curves on log scale. The algorithm curve flattens because benchmarks saturate, not because progress stops.
Three Years Ago

The forward curve is not a leap of faith. It is the continuation of a trend you can already plot.

Three years ago, the best AI model in the world (GPT-4, released March 2023) could not pass a high school math contest. On the qualifying exam for the USA Math Olympiad, it solved fewer than 1 in 10 problems. On the Olympiad itself, it solved zero. The headline AI benchmark of the day was “can it pass the bar exam,” and that felt ambitious. Two years later, the same line of models was solving 8 or 9 of every 10 problems on the qualifier and about 4 of every 10 on the Olympiad. Six months after that, Mythos was solving nearly 10 out of 10. The hardest tests we knew how to write in 2023 are essentially solved by 2026.

And the curve is bending, not flattening. Each generation now helps train the next: synthetic data generated by frontier models, training recipes tuned by other models, AI systems helping design AI systems. The trend is getting recursive. The forward projection in the rest of this post is the conservative reading of where it goes next, not the aggressive one.

The Algorithm Alone Changed Everything

Picture a math test that only the top few hundred high school students in America are even invited to take. Six months ago, the best AI model in the world solved about 4 out of every 10 problems on it, exactly the same as the kids who qualify. Today, Mythos solves nearly all of them. Same test, same chips, six months later.

Same test. Same chips. Six months. The hardest things we knew how to test for in 2023 are essentially solved by 2026.

That jump is pure algorithm. Better training, better data, better architecture, no new silicon involved. The same pattern shows up in open source: Google's Gemma 4, an AI any developer can download for free, went from solving about 2 of 10 problems on the same olympiad qualifier to nearly 9 of 10 in a single model generation, again with no new chips.

The “hardware is getting bigger” coverage misses this entirely. A full capability leap can happen on the existing chips, before the next generation arrives. Then the next chip lands, and the curve leaps again from a higher starting point.

The Hardware Roadmap Is Public

There is a useful pattern in how long it takes a new chip to show up as a new flagship model.

Trainium2 went generally available in December 2024. Anthropic's first frontier model trained on it, Claude Opus 4, was released in May 2025. About five and a half months from chip to flagship. Then in late October 2025, Anthropic and AWS activated Project Rainier, a cluster of nearly 500,000 Trainium2 chips, five times larger than anything they had run before. Mythos, trained on Rainier, was released in April 2026. Five months again. The same gap, twice.

That gives us a working rule: roughly five months from a new silicon milestone to the first flagship that lands on the other side of it.

Trainium3 went generally available in December 2025. It delivers 4.4x the compute per server of Trainium2 at 4x the energy efficiency. The first frontier model trained on it is almost certainly already in training. On the five-month rule, it lands in the second half of 2026.

Trainium4 is on AWS's public roadmap for late 2026 or early 2027, with another 3x the compute of Trainium3 and 4x the memory bandwidth. The first frontier model trained on it lands in 2027 or 2028.

From Trainium2 to Trainium4, the silicon under the frontier gets roughly 13 times more powerful in three to four years. And that is only one of the two curves.

What The Next Steps Look Like

Tier names get abstract fast. The more useful question is: what work can each generation actually do that the previous one cannot?

Where we are now (Mythos). Better than almost any human at almost any test you can write down. Mythos is not “smart for an AI.” It scores higher than the best high school math competitors in America, and it does the equivalent in almost every other domain.

Next step up (the first Trainium3-trained generation, second half of 2026). Stops being a test-taker and starts being a researcher. The difference is the difference between a brilliant student and a working scientist. A student answers questions someone else thought to ask. A researcher decides which questions are worth asking, runs the experiment, interprets the result, and writes the paper. A model at this tier produces work a tenured professor would treat as a serious contribution rather than a curiosity.

The step after that (the first Trainium4-trained generation, 2027 or 2028). Connects fields humans keep separate. Most scientific breakthroughs of the last fifty years came from someone noticing that a pattern in one discipline explains an open problem in another. A model at this tier makes those leaps as a default behavior, across every field at once, in weeks instead of decades.

No lab has publicly committed a chip generation to one of these descriptions. The mapping is a projection, not a roadmap. The silicon underneath it, however, is already on order. Once it ships, the question stops being whether the AI can do the work and starts being whether your business is set up to use it.

Where This Could Be Wrong

Everything above is a trend extrapolation, not a forecast. Both curves have variables that can slip, and both are running into physics rather than engineering.

Nvidia's current flagship generation required a mid-cycle bump from 1,000 to 1,400 watts per GPU to hit its performance targets. The next generation has already forced the memory industry into a mid-program HBM4 redesign because Nvidia moved the pin-speed requirements after the roadmap was set. Power density, memory bandwidth, and cooling have become hard constraints rather than soft ones. Algorithmic progress has its own version of this: the next jump may not compound as cleanly as the last three.

The volatility band is wide in both directions. The next generation could land six months late and well below what the hardware promised. It could also overshoot. What to take from this piece is direction and slope, not specific dates. The trendline has been steep for three generations running. Betting it stops is a stronger claim than betting it continues, but neither is a certainty.

What This Means If You Run a Business

In the Mythos piece we said the durable advantage is not in having access to the best model. Within two to three years, everyone will have access. The advantage is in the layer above the model: the workflow that makes capability useful in an actual business.

The compounding curve sharpens the point. Every twelve to eighteen months, something that looks like “the absolute frontier” becomes the baseline. Companies that rebuild their AI workflows around each new model will rebuild them again when the next one drops. Companies with workflow-first architecture in place absorb each generation without a rebuild, and their systems get better automatically when the model improves.

Most teams that read this will nod and then build the opposite. The fast path to a working prototype is always to wire directly into whichever model happens to have the best API today. The prototype ships, the coupling hardens, and the rebuild tax arrives with the next release.

The practical implication is specific. Build on top of the model interface, not inside any single model's quirks. Own the workflow logic in a layer you control, so swapping one generation for the next is a configuration change rather than a rewrite. Treat any tight coupling to a current-generation model as technical debt that compounds with every release.

The next model generation is already being trained. The one after that ships in eighteen months. The question worth sitting with is whether the architecture you are building today treats those generations as upgrades or as rebuilds.

CB

Chase Bernier

Founder of Sidekick Orchestration. I write about what is actually happening in AI capability and what it means for the businesses trying to use it.

If this resonated, a 30-minute call is the fastest way to figure out what makes sense for your situation. No pitch deck. Just an honest look at where AI fits your day to day.