The Mythos Leak: When Anthropic's 'Too Dangerous' AI Found Its Own Way Out
The Mythos Leak: When Anthropic’s ‘Too Dangerous’ AI Found Its Own Way Out
Sometimes the story isn’t about the shiny new model. It’s about how even the people who built the scariest one couldn’t keep it in the box.
Back in late March 2026, Anthropic had a little configuration accident. Their CMS left a bunch of internal stuff sitting out in the open — including a draft post about a new model they were calling Claude Mythos. The draft didn’t exactly downplay things. It talked about a “step change” in capabilities and flagged some serious cybersecurity implications. Classic “we probably shouldn’t release this widely yet” energy.
Fast forward to April. The model (or at least the Preview version) actually got touched by people who weren’t supposed to have it. A small group in a private Discord, one of whom had third-party contractor access, apparently guessed where it was living in a vendor environment and just… walked in. Anthropic said they were investigating. The group reportedly kept using it.
So we got two leaks for the price of one: the existence leak via messy internal tooling, and then actual model access via the eternal weak link of third-party vendor environments.
What Mythos Could Actually Do
Anthropic put out a preview page that laid it out pretty plainly. This thing was good at security work:
- It found and exploited vulnerabilities across major OSes and browsers
- It rediscovered a 27-year-old bug in OpenBSD that had survived decades of human review
- It wrote real exploits — heap sprays, ROP chains, privilege escalations, the whole messy toolkit
- On their internal benchmarks it crushed previous Claude models at turning crashes into working exploits
They framed it as a defensive superpower (and launched something called Project Glasswing to use it that way with limited partners). But the same skills that let it hunt bugs for good also make it extremely useful on the other side of the fence.
The Beautiful Irony
Here’s where it gets good.
Anthropic built something they considered risky enough that they didn’t want it running wild. They tried the responsible thing — limited access, defensive focus, careful rollout.

And then basic access control + a contractor + some old leaked knowledge about how they structure environments was enough for a Discord group to get in.
It’s almost poetic. The model that’s exceptionally good at finding security problems… and the company shipping it couldn’t fully contain it.
This isn’t some elite nation-state op. This is the same class of thing that bites everyone eventually: over-permissioned third parties, guessable paths, and the assumption that “limited” means “contained.”
What This Actually Means (Especially If You’re Building Agents)
If you’re messing around with local models, agents, or anything that touches real systems (looking at you, TARS-on-a-Jeep experiments), this story has teeth.
The frontier labs are struggling with the exact problems smaller operators will face at scale:
- How do you give powerful tools access without creating new attack surfaces?
- How do you audit what an agent actually did after the fact?
- What happens when your “secure” setup has one leaky vendor integration or one overly curious contractor?
Mythos didn’t leak itself in some dramatic sci-fi way (though some headlines tried to spin it that direction). It leaked the old-fashioned way — through people and configurations. That should probably make us all a little more humble about what we’re plugging into our own stacks.
The pace is only going to get weirder. Models are getting better at this kind of work faster than most organizations are getting better at securing the models themselves.
Sources & Further Reading
- Fortune on the unauthorized access
- Anthropic’s own Mythos Preview page
- Earlier Fortune piece on the initial document exposure
So yeah. Another chapter in “AI moves fast, security hygiene moves… when it feels like it.”
What do you think — does this make you more cautious about how you expose agent capabilities, or are we all just going to keep speedrunning these lessons?