The Model You Can't Use Tells You Everything About Where AI Is Heading

Anthropic just built the most capable AI model in history. They also decided nobody should be allowed to use it.

Claude Mythos Preview, released April 7, 2026, broke every major AI benchmark on record, and Anthropic responded by locking it down. The model is, according to Anthropic, “far ahead of any other AI model in cyber capabilities” and could enable attacks that outpace the current state of defense. So access is restricted to Project Glasswing, a consortium of roughly 40 companies (Apple, Google, Microsoft, Amazon, CrowdStrike) using the model exclusively for defensive cybersecurity work.

That is worth paying attention to, not for what it says about Mythos, but for what the numbers underneath it signal about where everything else is heading.

The Numbers

Here is why the scale matters, even if you never touch Mythos directly: the size of these jumps signals what the tools you will be using in 12 months are about to look like.

The benchmark improvements over Claude Opus 4.6 (Anthropic's current publicly available flagship) are not incremental. On the USAMO (the hardest math competition given to US students), Mythos scored 97.6%. Opus 4.6 scored 42.3%. That is a 55-point jump in a single model generation, roughly the equivalent of going from a failing grade to near-perfect on a test designed to challenge the best math students in the country.

On long-context reasoning (processing and reasoning across 256K to 1M tokens of information), Mythos scored 80%. Both Opus 4.6 and OpenAI's GPT-5.4 sit at 21%. Nearly four times the capability, in a single generation.

On software engineering tasks, Mythos hit 93.9% on SWE-bench Verified, a 13-point improvement over Opus 4.6, while using 4.9 times fewer tokens on comparable browsing benchmark tasks.

A 55-point improvement on a single benchmark in a single generation is closer to a phase transition than a product update.

The Infrastructure Behind It

Mythos was trained on Project Rainier, an AWS cluster of 500,000 custom Trainium2 chips delivering hundreds of exaflops. That cluster is five times the size of Anthropic's previous training infrastructure. Their engineers are co-developing the next generation of this silicon (Trainium3) with AWS at the kernel level: four times the compute performance at 40% less power.

The release cadence tells the same story from a different angle. Anthropic shipped Haiku 4.5 in October 2025, Opus 4.6 in February 2026, Sonnet 4.6 later that month, and Mythos Preview in April. That is roughly a major model release every six to eight weeks. A year ago, that pace would have seemed reckless. Now it is the baseline. In another year, it will probably seem slow.

The hardware pipeline and the model cadence are both accelerating. They are compounding on each other.

The Convergence Nobody Is Talking About

The generational jump Mythos represents is not unique to Anthropic. That is the part worth paying attention to.

Google's Gemma 4, released days before Mythos, showed the same pattern in open source. On the AIME math benchmark, Gemma 4 scored 89.2%. Its predecessor scored 20.8%. A 68-point jump, using a model small enough to run on a laptop GPU. Different benchmark, different lab, same generational leap happening at the same time.

Across the broader landscape, the gap between open source and closed frontier models has collapsed to effectively zero on knowledge benchmarks and single digits on most reasoning tasks. Six major labs now ship competitive open-weight models. DeepSeek V3.2 already surpasses GPT-4.5 on math and coding.

The inference cost curve tells the rest of the story. GPT-4 equivalent capability cost $20 per million tokens in late 2022. It costs $0.40 today, a 50x reduction in three and a half years. According to Epoch AI's peer-reviewed analysis, the median rate of inference cost decline is 50x per year, with a conservative floor of 10x per year.

Apply that conservatively: Opus-class performance (currently $5–25 per million tokens via Anthropic's API) reaches commodity pricing within 12 to 18 months. Mythos-class performance, currently unavailable at any price, likely reaches $3–5 per million tokens within two to three years of general availability. The cost to achieve a given level of AI performance is falling by roughly an order of magnitude every 12 to 18 months, and nothing in the hardware or architecture pipeline suggests that decelerating.

What This Means If You Run a Business

The capability layer of AI is commoditizing faster than most businesses realize, and that changes the math on when to start building. When BenchLM.ai calculates a cost-per-performance ratio (benchmark score divided by price per million tokens), the top value models today deliver 75% of frontier capability at 1/60th the cost. That value frontier moves as a wave: today's best-value tier becomes next year's commodity floor. What is considered enterprise-grade AI today will be available on a standard subscription plan within 18 months.

The durable advantage is not in having access to the best model. Within two to three years, everyone will have that access. The advantage lives in the layer above the model: the workflows, the configuration, the system architecture that makes AI capabilities useful in an actual business context. The businesses building that layer now, learning what works and refining their systems with each model generation, are compounding an operational advantage that will be very difficult to replicate once the rest of the market catches on.

“I hate written communication. Like I hate it with a passion.” Two weeks later she was sending consistent client communications without drafting a word herself. The model did not change. The workflow was already built to absorb it. When the model does improve, the output improves automatically.

What We Are Building on Top of This

If every model upgrade means starting over with new tools and new setup, you are paying the switching cost while your competitors compound. That is the problem Sidekick Orchestration was built to solve: workflows first, AI layer second. When the underlying model improves, the system improves with it. No rebuild, no migration, no six-month integration project.

The question is not whether to start. It is whether you will have the architecture in place when the next model drops, which, based on the numbers above, is roughly every six to eight weeks.

If you want to see what that looks like for a real business, the companion to this post goes there.

The Model You Can't Use Tells You Everything About Where AI Is Heading

The AI Setup You Think You Can't Afford Already Exists