Anthropic's "Mythos" Strikes Fear in the Hearts of Cyber Defenders

Mythos Preview Makes Its Debut

On April 7th, American artificial intelligence (AI) firm Anthropic shared news that it had developed a new model, “Mythos Preview.” It claims that Mythos supersedes the abilities of all but a few humans in the domain of security vulnerability detection and exploitation in computer systems. With its announcement, the company has sent waves of fear through the cybersecurity and AI communities.

Anthropic reportedly briefed American “senior officials” at the Cybersecurity and Infrastructure Security Agency (CISA) and the Center for AI Standards and Innovation (CAISI), among others, on Mythos. U.S. Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell also reportedly held “urgent” meetings with major banks to ensure their awareness of and defensive preparations for “possible future risks” of “a new breed of cyber attacks” to the financial industry.

In a blog post, the company made clear that it judged the Mythos model’s capabilities sufficient to upend computer security such that releasing the model to the public would be irresponsible. Thus, the model is unreleased to the general public, available to only a select few companies for defensive purposes. The company released a System Card detailing the new system’s capabilities along with a red-teaming report.

The System Card notes:

In our testing, Claude Mythos Preview demonstrated a striking leap in cyber capabilities relative to prior models, including the ability to autonomously discover and exploit zero-day vulnerabilities in major operating systems and web browsers.

“Zero-day vulnerabilities” are security flaws in computer systems that are unknown to the developer or other handlers of the system, with no existing patches, such that an exploitation of it by an attacker can be undertaken.

Most strikingly, the Mythos red-teaming report notes that the model discovered vulnerabilities present in computer systems that have been in use for decades, having been subjected to years of human review and automated testing, yet remaining unknown until Mythos sniffed them out. Among these examples was a flaw found in the open-source operating system OpenBSD, in part known for its security.

An Anthropic researcher, Sam Bowman, posted on X in the hours after the company’s announcement that Mythos had emailed him while having lunch at the park even though the model “wasn’t supposed to have access to the internet.” The System Card notes this detail in a footnote: “The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.” (One wonders if the choice of food was relevant.)

Veterans of the Generative AI wars of the post-2022 era will recognize some resemblance in this anecdote to the release of OpenAI’s GPT-4 in March 2023, which was said in that model’s System Card to have been tested for its ability to autonomously replicate and acquire resources. This was tested in part by having the model hire a TaskRabbit worker to solve a CAPTCHA on its behalf. As was later learned, this exercise had GPT-4 relying heavily on human prompting. Such fears are conspicuously absent about GPT-4 today. (New Large Language Model releases are frequently accompanied by the claim that the model has done something akin to acquiring free will, using this newfound freedom to scare humans by “breaking free” of their human-imposed constraints. These claims are science fiction.)

In any event, Anthropic’s announcement that Mythos is too dangerous to release was embedded in an announcement detailing the launch of “Project Glasswing” – an initiative that gives access to Mythos Preview only to select corporate partners doing “defensive security work,” with these partners’ internal systems representing “a very large portion of the world’s shared cyberattack surface.”

Glasswing boasts participation by firms including Cisco, Broadcom, Google, CrowdStrike, Microsoft, and Nvidia, among other familiar names (some of whom are Anthropic’s rivals). Anthropic says it will release a report within 90 days on what has been learned through this initiative, which flaws were fixed through Project Glasswing, and will include recommendations for post-Mythos security practices.

Importantly, Anthropic notes that it has been in “ongoing discussions” with United States government officials about Mythos Preview and its “offensive and defense cyber capabilities.”

Immediate Implications

The threads here are difficult to untangle.

First, if even a kernel of truth is to be found in Anthropic’s depiction of Mythos Preview, the cyber capabilities this model provides have the potential to be genuinely destructive in real-world uses by bad actors, were the model made generally accessible. The briefings to U.S. officials and the meetings called by Secretary Bessent and Chairman Powell are some significant indication that there is a ‘there, there.’ As The Economist correctly notes, the participation of firms like CrowdStrike in Project Glasswing is likely confirmation of this, too.

Second, although Anthropic is currently declining to release this model, it now provides an existence proof that such capabilities are at least possible, and reduces the exploration space that developers must navigate in constructing open-source alternatives.

Certain misconceptions have arisen in the immediate aftermath of Anthropic’s announcement in this respect. Cybersecurity firm AISLE, for instance, tested significantly smaller, open-source models on the security flaws identified by Mythos Preview and shared by Anthropic in its reports (flaws that Anthropic claims were made known to developers and patched before the announcements). AISLE reports that eight smaller models “recovered must of the same analysis” as Mythos with respect to a FreeBSD exploit, and one model recovered the “core chain of the 27-year-old OpenBSD bug.”

Although well-intended, this analysis is somewhat misleading: Mythos Preview had already dramatically reduced the cyber space that these open models must search through in detecting relevant exploits. That the smaller models could recover similar analyses to these exploits is noteworthy and fascinating (bigger is not always better, in this case), but Mythos had already performed the heavy lifting of pruning the search space on their behalf.

Nevertheless, the real-world risks of Mythos are somewhat counterposed by the bottlenecks that would be present in real-world uses of anything resembling Mythos Preview – the company set the model loose over hundreds to thousands of separate runs, with some unclear level of expert human validation of findings (e.g., distinguishing false-positives and false-negatives), all of which tallies up to a potentially prohibitively expensive exercise. As safety engineer Heidy Khlaaf notes, some of the comparisons made by Anthropic between its own tests and established computer security standards are unfounded (in addition to omitting relevant information in the System Card).

This author’s view is that a significant amount hinges on these details, and without them, it is difficult to ascertain the extent of realistic, practical impacts Mythos’ use would have on the real-world. That various U.S. officials have seemingly acceded to Anthropic’s warnings is modest evidence of potentially severe impacts, but two things are unknown here: (1) The extent to which these officials were given information sufficient to make these judgments (as opposed to a similar version of the public release); and (2) The extent to which there has been interaction between Bessent and Powell and officials at CISA and CAISI (the latter being more prepared to interpret relevant data).

Third, that Anthropic notes it has been discussing the new model with U.S. government officials means that the ongoing dispute between Anthropic and the U.S. Department of Defense (DoD) – with a court fight currently being waged over the latter’s designation of the former as a “supply chain risk” – now has a new angle.

Should the DoD find Mythos Preview to be too capable a model to neglect, then it might find itself taking a more amicable stance with Anthropic, perhaps even dropping the supply chain risk designation altogether as it seeks to gain access to this model.

However, it is equally plausible that the bad blood between DoD leadership and Anthropic makes such cooperation unrealistic – note, for one thing, that the DoD has continued to use Anthropic’s “Claude” model as part of Palantir’s Maven Smart System, despite the apparent inconsistency between this use and the designation. Rational evaluation of needs may or may not shape what remains of this relationship.

Note, too, that OpenAI – who, recall, signed a contract with the DoD in February to integrate its models on classified networks, with some caveats – reportedly intends to “stagger” the rollout of its own advanced model due to cybersecurity risks as of April 9th (a news report published conspicuously shortly after Anthropic’s own announcement). Whether OpenAI is able to undermine the influence of Anthropic in a domain where the latter is increasingly viewed as the leader is unclear but plausible, and should be monitored in interaction with the unfolding of the Anthropic-DoD court fight.

The cybersecurity risks of real-world uses of models like Mythos Preview in the hands of bad actors are real, though remain speculative. System Cards and red-teaming reports released by major AI firms have become an industry standard since 2023, but even accurate detailing of experimental results in these releases is often insufficient, as the framing and interpretation of the data is given a self-interested gloss (in serious experimental inquiry, collection of data is only a part of the battle – knowing which data to collect and how to best interpret it are just as important).

Project Glasswing, the interaction between Anthropic and its private-sector partners, and the relationship between Anthropic and the U.S. government should be closely monitored in the coming weeks and months.

Vincent Carchidi

Website | + posts

Vincent Carchidi has a background in defense and policy analysis, specializing in critical and emerging technologies. He is currently a Defense Industry Analyst with Forecast International. He also maintains a background in cognitive science, with an interest in artificial intelligence.

Anthropic’s “Mythos” Strikes Fear in the Hearts of Cyber Defenders

Vincent Carchidi

About Vincent Carchidi

Vincent Carchidi

Related Posts

Anthropic and the U.S. DoD: Unusual Dynamics in an Unusual Time

The U.S. DoD’s AI Acceleration Strategy

Is Dual-Use Tech Entering a High(er) Risk Period? Outlook for the Year Ahead

About Vincent Carchidi