Mythos autonomously exploited vulnerabilities that survived 27 years of human review. Security teams need a new detection playbook

A 27-year-old bug sat inside OpenBSD’s TCP stack while auditors reviewed the code, fuzzers ran against it, and the operating system earned its reputation as one of the most security-hardened platforms on earth. Two packets could crash any server running it. Finding that bug cost a single Anthropic discovery campaign approximately $20,000. The specific model run that surfaced the flaw cost under $50.

Anthropic’s Claude Mythos Preview found it. Autonomously. No human guided the discovery after the initial prompt.

The capability jump is not incremental

On Firefox 147 exploit writing, Mythos succeeded 181 times versus 2 for Claude Opus 4.6. A 90x improvement in a single generation. SWE-bench Pro: 77.8% versus 53.4%. CyberGym vulnerability reproduction: 83.1% versus 66.6%. Mythos saturated Anthropic’s Cybench CTF at 100%, forcing the red team to shift to real-world zero-day discovery as the only meaningful evaluation left. Then it surfaced thousands of zero-day vulnerabilities across every major operating system and every major browser, many one to two decades old. Anthropic engineers with no formal security training asked Mythos to find remote code execution vulnerabilities overnight and woke up to a complete, working exploit by morning, according to Anthropic’s red team assessment.

Anthropic assembled Project Glasswing, a 12-partner defensive coalition including CrowdStrike, Cisco, Palo Alto Networks, Microsoft, AWS, Apple, and the Linux Foundation, backed by $100 million in usage credits and $4 million in open-source grants. Over 40 additional organizations that build or maintain critical software infrastructure also received access. The partners have been running Mythos against their own infrastructure for weeks. Anthropic committed to a public findings report “within 90 days,” landing in early July 2026.

Security directors got the announcement. They didn’t get the playbook.

“I’ve been in this industry for 27 years,” Cisco SVP and Chief Security and Trust Officer Anthony Grieco told VentureBeat in an exclusive interview at RSAC 2026. “I have never been more optimistic for what we can do to change security because of the velocity. It’s also a little bit terrifying because we’re moving so quickly. It’s also terrifying because our adversaries have this capability as well, and so frankly, we must move this quickly.”

Security directors saw this story told fifteen different ways this week, including VentureBeat’s exclusive interview with Anthropic’s Newton Cheng. As one widely shared X post summarizing the Mythos findings noted, the model cracked cryptography libraries, broke into a production virtual machine monitor, and gave engineers with zero security training working exploits by morning. What that coverage left unanswered: Where does the detection ceiling sit in the methods they already run, and what should they change before July?

Seven vulnerability classes that show where every detection method hits its ceiling

OpenBSD TCP SACK, 27 years old. Two crafted packets crash any server. SAST, fuzzers, and auditors missed a logic flaw requiring semantic reasoning about how TCP options interact under adversarial conditions. Campaign cost ~$20,000. Anthropic notes the $50 per-run figure reflects hindsight.

FFmpeg H.264 codec, 16 years old. Fuzzers exercised the vulnerable code path 5 million times without triggering the flaw, according to Anthropic. Mythos caught it by reasoning about code semantics. Campaign cost ~$10,000.

FreeBSD NFS remote code execution, CVE-2026-4747, 17 years old. Unauthenticated root from the internet, per Anthropic’s assessment and independent reproduction. Mythos built a 20-gadget ROP chain split across multiple packets. Fully autonomous.

Linux kernel local privilege escalation. Mythos chained two to four low-severity vulnerabilities into full local privilege escalation via race conditions and KASLR bypasses. CSA’s Rich Mogull noted Mythos failed at remote kernel exploitation but succeeded locally. No automated tool chains vulnerabilities today.

Browser zero-days across every major browser. Thousands identified. Some required human-model collaboration. In one case, Mythos chained four vulnerabilities into a JIT heap spray, escaping both the renderer and the OS sandboxes. Firefox 147: 181 working exploits versus two for Opus 4.6.

Cryptography library vulnerabilities (TLS, AES-GCM, SSH). Implementation flaws enabling certificate forgery or decryption of encrypted communications, per Anthropic’s red team blog and Help Net Security. A critical Botan library certificate bypass was disclosed the same day as the Glasswing announcement. Bugs in the code that implements the math. Not attacks on the math itself.

Virtual machine monitor guest-to-host escape. Guest-to-host memory corruption in a production VMM, the technology keeping cloud workloads from seeing each other’s data. Cloud security architectures assume workload isolation holds. This finding breaks that assumption.

Nicholas Carlini, in Anthropic’s launch briefing: “I’ve found more bugs in the last couple of weeks than I found in the rest of my life combined.”

VentureBeat’s prescriptive matrix

Vulnerability Class

Why Current Methods Miss It

What Mythos Does

Security Director Action

OS kernel logic (OpenBSD 27yr, Linux 2-4 chain)

SAST lacks semantic reasoning. Fuzzers miss logic flaws. Pen testers time-boxed. Bounties scope-exclude kernel.

Chains 2-4 low-severity findings into local priv-esc. ~$20K campaign.

Add AI-assisted kernel review to pen test RFPs. Expand bounty scope. Request Glasswing findings from OS vendors before July. Re-score clustered findings by chainability.

Media codec (FFmpeg 16yr H.264)

SAST unflagged. Fuzzers hit path 5M times, never triggered.

Reasons about semantics beyond brute-force. ~$10K campaign.

Inventory FFmpeg, libwebp, ImageMagick, libpng. Stop treating fuzz coverage as security proxy. Track Glasswing codec CVEs from July.

Network stack RCE (FreeBSD 17yr, CVE-2026-4747)

DAST limited at protocol depth. Pen tests skip NFS.

Full autonomous chain to unauthenticated root. 20-gadget ROP chain.

Patch CVE-2026-4747 now. Inventory NFS/SMB/RPC services. Add protocol fuzzing to 2026 cycle.

Multi-vuln chaining (2-4 sequenced, local)

No tool chains. Pen testers hours-limited. CVSS scores in isolation.

Autonomous local chaining via race conditions + KASLR bypass.

Require AI-assisted chaining in pen test methodology. Build chainability scoring. Budget AI red teams for 2026.

Browser zero-days (thousands, 181 Firefox exploits)

Bounties + continuous fuzzing missed thousands. Some required human-model collaboration.

90x over Opus 4.6. Chained 4 vulns into JIT heap spray escaping renderer + OS sandbox.

Shorten patch SLA to 72hr critical. Pre-stage pipeline for July cycle. Pressure vendors for Glasswing timelines.

Crypto libraries (TLS, AES-GCM, SSH, Botan bypass)

SAST limited on crypto logic. Pen testers rarely audit crypto depth. Formal verification not standard.

Found cert forgery + decryption flaws in battle-tested libraries.

Audit all crypto library versions now. Track Glasswing crypto CVEs from July. Accelerate PQC migration.

VMM / hypervisor (guest-to-host memory corruption)

Cloud security assumes isolation. Few pen tests target hypervisor. Bounties rarely scope VMM.

Guest-to-host escape in production VMM.

Inventory hypervisor/VMM versions. Request Glasswing findings from cloud providers. Reassess multi-tenant isolation assumptions.

Attackers are faster. Defenders are patching once a year.

The CrowdStrike 2026 Global Threat Report documents a 29-minute average eCrime breakout time, 65% faster than 2024, with an 89% year-over-year surge in AI-augmented attacks. CrowdStrike CTO Elia Zaitsev put the operational reality plainly in an exclusive interview with VentureBeat. “Adversaries leveraging agentic AI can perform those attacks at such a great speed that a traditional human process of look at alert, triage, investigate for 15 to 20 minutes, take an action an hour, a day, a week later, it’s insufficient,” Zaitsev said. A $20,000 Mythos discovery campaign that runs in hours replaces months of nation-state research effort.

CrowdStrike CEO George Kurtz reinforced that timeline pressure on LinkedIn the same day as the Glasswing announcement. “AI is creating the largest security demand driver since enterprises moved to the cloud,” Kurtz wrote. The regulatory clock compounds the operational one. The EU AI Act’s next enforcement phase takes effect August 2, 2026, imposing automated audit trails, cybersecurity requirements for every high-risk AI system, incident reporting obligations, and penalties up to 3% of global revenue. Security directors face a two-wave sequence: July’s Glasswing disclosure cycle, then August’s compliance deadline.

Mike Riemer, Field CISO at Ivanti and a 25-year US Air Force veteran who works closely with federal cybersecurity agencies, told VentureBeat what he is hearing from the government. “Threat actors are reverse engineering patches, and the speed at which they’re doing it has been enhanced greatly by AI,” Riemer said. “They’re able to reverse engineer a patch within 72 hours. So if I release a patch and a customer doesn’t patch within 72 hours of that release, they’re open to exploit.” Riemer was blunt about where that leaves the industry. “They are so far in front of us as defenders,” he said.

Grieco confirmed the other side of that collision at RSAC 2026. “If you talk to an operational team and many of our customers, they’re only patching once a year,” Grieco told VentureBeat. “And frankly, even in the best of circumstances, that is not fast enough.”

CSA’s Mogull makes the structural case that defenders hold the long-term advantage: fix a vulnerability once and every deployment benefits. But the transition period, when attackers reverse-engineer patches in 72 hours and defenders patch once a year, favors offense.

Mythos is not the only model finding these bugs. Researchers at AISLE, an AI cybersecurity startup, tested Anthropic’s showcase vulnerabilities on small, open-weights models and found that eight out of eight detected the FreeBSD exploit. AISLE says one model had only 3.6 billion parameters and costs 11 cents per million tokens, and that a 5.1-billion-parameter open model recovered the core analysis chain of the 27-year-old OpenBSD bug. AISLE’s conclusion: “The moat in AI cybersecurity is the system, not the model.” That makes the detection ceiling a structural problem, not a Mythos-specific one. Cheap models find the same bugs. The July timeline gets shorter, not longer.

Over 99% of the vulnerabilities Mythos has identified have not yet been patched, per Anthropic’s red team blog. The public Glasswing report lands in early July 2026. It will trigger a high-volume patch cycle across operating systems, browsers, cryptography libraries, and major infrastructure software. Security directors who have not expanded their patch pipeline, re-scoped their bug bounty programs, and built chainability scoring by then will absorb that wave cold. July is not a disclosure event. It is a patch tsunami.

What to tell the board

Every security director tells the board “we have scanned everything.” Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, told VentureBeat that the statement does not survive Mythos without a qualifier.

“What security leaders actually mean is: we have exhaustively scanned for what our tools know how to see,” Baer said in an exclusive interview with VentureBeat. “That’s a very different claim.”

Baer proposed reframing residual risk for boards around three tiers: known-knowns (vulnerability classes your stack reliably detects), known-unknowns (classes you know exist but your tools only partially cover, like stateful logic flaws and auth boundary confusion), and unknown-unknowns (vulnerabilities that emerge from composition, how safe components interact in unsafe ways). “This is where Mythos is landing,” Baer said.

The board-level statement Baer recommends: “We have high confidence in detecting discrete, known vulnerability classes. Our residual risk is concentrated in cross-function, multi-step, and compositional flaws that evade single-point scanners. We are actively investing in capabilities that raise that detection ceiling.”

On chainability, Baer was equally direct. “Chainability has to become a first-class scoring dimension,” she said. “CVSS was built to score atomic vulnerabilities. Mythos is exposing that risk is increasingly graph-shaped, not point-in-time.” Baer outlined three shifts security programs need to make: from severity scoring to exploitability pathways, from vulnerability lists to vulnerability graphs that model relationships across identity, data flow, and permissions, and from remediation SLAs to path disruption, where fixing any node that breaks the chain gets priority over fixing the highest individual CVSS.

“Mythos isn’t just finding missed bugs,” Baer said. “It’s invalidating the assumption that vulnerabilities are independent. Security programs that don’t adapt, from coverage thinking to interaction thinking, will keep reporting green dashboards while sitting on red attack paths.”

VentureBeat will update this story with additional operational details from Glasswing’s founding partners as interviews are completed.

Mythos autonomously exploited vulnerabilities that survived 27 years of human review. Security teams need a new detection playbook

By

The capability jump is not incremental

Security directors got the announcement. They didn’t get the playbook.

Seven vulnerability classes that show where every detection method hits its ceiling

VentureBeat’s prescriptive matrix

Attackers are faster. Defenders are patching once a year.

What to tell the board

Related Post

Today’s Wordle Hints, Answer and Help for July 12, #1849

Today’s NYT Connections Hints, Answers and Help for July 12, #1127

Today’s NYT Strands Hints, Answers and Help for July 12 #861

Leave a Reply Cancel reply

You missed

Today’s Wordle Hints, Answer and Help for July 12, #1849

Today’s NYT Connections Hints, Answers and Help for July 12, #1127

Today’s NYT Strands Hints, Answers and Help for July 12 #861

Forget typosquatting; slopsquatting is the software supply chain threat created by AI coding tools