Anthropic’s Fable 5 Is a Brilliant Product and a Philosophical Mess – cmartn.com

By Christian Martin

There is a specific kind of Silicon Valley announcement that arrives dressed as a concession. It carries the language of caution and responsibility, but underneath the press release there is a pricing slide and a product launch. Anthropic, the AI safety company that has made institutional seriousness its primary brand differentiator, delivered one of those announcements on Tuesday. It is called Claude Fable 5, and it is, depending on where you sit on the credibility spectrum, either the most thoughtful public AI release in recent memory or a masterclass in managed perception.

To understand why this matters, you need to go back to April, when Anthropic first unveiled Claude Mythos and then immediately told the world it could not have it.

What Mythos Actually Is

Mythos is not simply a better chatbot. It represents a qualitative shift in what AI systems can do in the domain of cybersecurity. Anthropic said, when it announced the model, that it had used Mythos to find thousands of security vulnerabilities that had gone undetected in popular software systems for years. More alarming still, Mythos demonstrated the ability to identify disparate security flaws and connect them into what researchers call "exploit chains": coordinated sequences of attacks that string together multiple vulnerabilities to penetrate a system in ways that no single flaw would allow. Anthropic described this capability as a "step change."

That is not marketing language. Cisco, one of the roughly 40 organizations given early access to Mythos for defensive testing, confirmed the characterization. Anthony Grieco, Cisco’s senior vice president and chief security and trust officer, said Mythos was "significantly more powerful than existing systems in certain areas" and that companies like his should be "super aggressive" about using it to identify and patch vulnerabilities before hackers could exploit them. He also noted, carefully, that the same capability that makes exploit chains dangerous in an attacker’s hands makes them valuable in a defender’s. "We are using that capability to help triage vulnerabilities and understand which ones are important to fix," he said.

This dual-use nature, the same tool serving offense and defense, is the central tension that every decision Anthropic has made since April has been forced to navigate. It has not navigated it cleanly.

The Architecture of Controlled Access

When Mythos was first announced, Anthropic shared it with approximately 40 organizations managing critical internet infrastructure. The reasoning was explicit: give defenders time to patch vulnerabilities before hackers gained access to comparable technology. Last week, that group expanded to roughly 150 organizations across 15 countries. On Tuesday, Anthropic said it would broaden access further through what it calls a "systematic trusted-access program," a formulation that sounds rigorous and means, in practice, that the company retains discretion over who gets the full model.

The full model is what is now called Mythos 5. It is the same underlying architecture as Fable 5 but with certain safeguards relaxed for partners operating under Anthropic’s Project Glasswing program. The public, enterprises, and API developers get Fable 5, which sits on top of Mythos’s foundational weights but includes three classifiers that intercept queries before they reach the model’s full capability set.

Those three classifiers cover cybersecurity, biology and chemistry, and what Anthropic terms "distillation" attempts, meaning efforts to extract the model’s capabilities for competitive replication. When a query trips one of these classifiers, the system silently routes the request to Claude Opus 4.8, last month’s model, which was itself designed to avoid Mythos’s more dangerous outputs. In other words: Fable 5 is Mythos with a trap door. For most queries, you get the best AI system publicly available. For a certain class of queries, you get something the company shipped thirty days ago. (Reuters / StreetInsider)

Anthropic estimates that fewer than 5 percent of sessions trigger a fallback. That is a small number in aggregate. But it is not zero, and it is not random. The 5 percent represents exactly the queries that cybersecurity professionals, biology researchers, and competitive AI developers would most want to run, and the ones they are now blocked from running unless they can get into Project Glasswing.

The Price of a Sanitized Model

Fable 5 is priced at $10 per million input tokens and $50 per million output tokens. That is exactly double the cost of Claude Opus 4.8. Anthropic’s head of product management, Dianne Penn, offered a reasonable caveat: because Fable uses fewer tokens on certain classes of complex tasks, the effective per-job cost might be comparable or lower for workflows that previously required heavy prompting to produce acceptable output. That argument is probably true for some enterprise workloads and less relevant for others. (CNBC)

What is harder to argue around is the subscription dynamic. Until June 23, Fable 5 is included in existing Pro, Team, and Enterprise plans at no additional charge. After June 23, it moves to usage-based credits and will require separate payment until Anthropic can reinstate it as a standard subscription feature. The company has also implemented a 30-day data retention policy for all traffic, including for enterprise customers who previously had zero-retention agreements. Both of these decisions, the pricing structure and the data retention change, were made at the same time as a safety announcement. That timing is worth noting.

To be fair, the performance case for Fable 5 is real. It topped all publicly available AI systems in overall benchmark performance, according to Vals AI. It achieved the first score above 90 percent on Hex’s core analytics benchmark and led evaluations run by Genspark and the developer platform Base44, where it was reported to reliably produce working application code in a single pass. Rayan Krishnan, the chief executive of Vals AI, noted particular strength in code generation and mathematics, though he added that other systems still outperform Fable in health care and tax evaluation contexts.

Accuracy versus API cost comparison across frontier models, as evaluated by Vals AI. Fable 5 leads on aggregate benchmark performance but arrives at double the per-token price of Opus 4.8.

This is not a marginal product. But it is also a product that was, by Anthropic’s own admission as recently as May, too dangerous to release. The company said in May that sufficient safeguards had not yet been established. Fable 5 was announced in June. Either the safety research made a breakthrough of extraordinary speed, or the commercial pressure to release intensified beyond the point where holding was viable. The company has not said which.

When Guardrails Become the Product

The UK AI Safety Institute, during its preliminary evaluation period, made initial progress toward a universal jailbreak of Fable 5’s classifier system before testing concluded. An external bug bounty program logged over 1,000 hours without producing a complete bypass, which Anthropic cites as evidence of robustness. The company says it has done extensive red-teaming, both internally and with external partners, to make jailbreaking harder. Anthropic’s own blog post, however, concedes: "Because we have prioritized safety, we’ve deliberately tuned the safeguards to be cautious, and they are still stricter than would be ideal."

Read that sentence carefully. The company is simultaneously telling you the guardrails are over-tuned, meaning they block legitimate queries too aggressively, and that jailbreaks remain a live risk. The system is calibrated conservatively because a permissive calibration would be dangerous, but the conservative calibration generates false positives that frustrate legitimate users. There is no clean resolution to this. It is an inherent property of classifier-based safety systems operating on ambiguous natural language input.

Penn, speaking to Reuters, illustrated the tradeoff plainly: "If I’m a university student requesting the model to help identify cyber vulnerabilities in a specific package or code, the model will decline, and Fable 5 will revert to Opus 4.8 for the response." That student might be doing defensive security research. She might be writing a thesis. The classifier does not know, and under the current calibration, it does not attempt to distinguish.

The Debate That Has Not Resolved

The cybersecurity research community remains divided on whether Anthropic’s approach is the right one, and that division is not a failure of consensus. It reflects a genuine underlying disagreement about how dual-use technology should be governed.

Gary McGraw, a veteran security researcher, has argued that "the technology is not too dangerous to release. If you don’t release a tool like this, or you hoard it, you are not solving the real problem." His logic is that defenders need access to the same tools as attackers, and restricting the defensive community’s access to Mythos-class capability while attackers work to develop or acquire equivalent technology independently achieves the worst of both worlds. Pavel Gurvich, co-founder of the security company Tenzai, extended this argument toward epistemology: when access is restricted, independent researchers cannot verify the capability claims being made, cannot probe the system’s actual limits, and cannot develop effective defensive postures against technology they have never touched. "This is especially true because the announcement was accompanied by very bold claims that we can’t assess," he said.

Stanislav Fort, a former Anthropic researcher who now runs a security company, offered the framing that should be hardest for Anthropic to dismiss, coming as it does from someone who understands the organization from the inside. Keeping AI technology bottled up will not work long-term, he said, because too many organizations globally are building comparable systems, many of which are being open-sourced. "Security by obscurity is one of the oldest bad ideas in the field."

That argument has empirical support. Within weeks of Anthropic’s April announcement, independent researchers demonstrated that existing AI systems could find some of the same security holes that Mythos had identified. The window of exclusive capability that Anthropic is guarding may be narrower than its controlled-release strategy implies.

The OpenAI Comparison Nobody Wants to Make

A week after Anthropic announced Mythos, OpenAI said it was sharing its own comparable model, GPT-5.4-Cyber, with hundreds of organizations, with plans to expand to thousands more over the following weeks. Gurvich said that approach "made more sense," partly because OpenAI paired broader distribution with identity verification designed to limit misuse.

Benchmark comparison of Fable 5, Mythos, Claude Opus 4.8, and GPT-5.5 across key evaluation dimensions. Fable 5 leads aggregate rankings but narrows in health care and legal contexts.

This comparison is uncomfortable to make because OpenAI is not generally cited as the more responsible actor in the AI safety discourse. Anthropic was founded, in part, by people who left OpenAI because they believed the company was moving too fast without adequate safety infrastructure. For OpenAI to be the organization that chose a more open distribution model for comparable dual-use technology is a reversal that deserves more scrutiny than it has received.

Logan Graham, head of Anthropic’s Frontier Red Team, acknowledged the difficulty honestly in an earlier interview: "For capabilities like this, this is kind of an unprecedented situation where we truly do not have all the answers. We don’t truly know what is the best way to roll out models like this." That admission is intellectually honest. It is also, if you are a paying customer or a researcher trying to understand the technology, somewhat cold comfort.

What You Are Actually Buying

Fable 5 is a genuinely excellent AI system. For developers building complex software pipelines, for analysts working through large document sets, for researchers in domains nowhere near cybersecurity or biology, it will likely justify its price and its benchmarks. Anthropic is not wrong that it represents the most capable publicly available AI as of this writing.

But Anthropic is asking the market to accept several things simultaneously. It is asking you to pay twice the price of Opus 4.8 for a system that routes a meaningful category of queries back to Opus 4.8. It is asking you to trust that classifier-based guardrails, which the company’s own researchers describe as imperfect and evolving, constitute meaningful safety architecture. It is asking the research community to accept that restricted access to Mythos produces better security outcomes than broad access would, even as independent researchers produce evidence that the capability gap is narrowing from outside. And it is enforcing a new 30-day data retention policy on enterprise customers who specifically paid to avoid data retention, at the same moment it is telling the world it cares deeply about responsible stewardship of powerful technology.

None of those positions is indefensible individually. Together, as a product launch, they form an argument that deserves more skepticism than the benchmarks tend to generate. Anthropic built its reputation on the claim that it would not let commercial pressure override safety judgment. Fable 5 is the first major test of whether that reputation was a description of the company’s character or a strategy for its positioning. The answer, so far, is genuinely unclear.

Sources

Anthropic Official Announcement: "Claude Fable 5 and Claude Mythos 5," Anthropic, June 8, 2026. anthropic.com
CNBC: "Anthropic releases Mythos-like AI model to the public a month after private release sent shockwaves," June 9, 2026. cnbc.com
Reuters / StreetInsider: "Anthropic rolls out public version of Mythos without cybersecurity capability," June 9, 2026. streetinsider.com