Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Teams of LLM Agents Can Exploit Zero-Day Vulnerabilities (arxiv.org)
105 points by belter on June 9, 2024 | hide | past | favorite | 75 comments


I've asked Claude-opus to find vulnerabilities in well known code bases and explain what the bug is and what code change should be used to fix it. So far it has identified issues that are technically correct but aren't actually an issue within the larger context of the how the code is used. That said, I'm pretty sure I could find vulnerabilities by using Claude-opus as first pass filter to find problem areas.

ChatGPT broke a cryptographic protocol I wrote[0]. I convinced myself ChatGPT was wrong about the attack until a human cryptographer pointed the same attack out [1].

"The protocol you described appears to be a variant of the Schnorr signature scheme, with a zero-knowledge proof of knowledge of the signature. However, this specific variant does not provide security against forgery attacks by the prover.

The reason for this is that the prover can choose a random value r and compute w = r^e mod N, without actually computing the signature sig = h(m)^d mod N. Then, the prover can simply choose zksig = w and publish (w, zksig, m) as the proof. Since zksig = w^d mod N = r^(ed) mod N, it satisfies the verification equation zksig^e mod N = h(m)*w mod N, even though it is not a valid signature for the message m."

[0]: https://chatgpt.com/share/f5526fd5-7ebb-4b15-a2bf-703ac88de4...

[1]: https://crypto.stackexchange.com/questions/105704/nizk-proof...


Have you tried putting a past, now patched, bug through [LLM] on purpose, as a test?


If it’s in the past on a public repo there’s a good chance it’s in the training data.


I think LLMs have a lot of promise for automated vulnerability research. However, to pick at two small things:

1. The vulnerability classes chosen (XSS, CSRF, SQLi, etc.) have well known, basic "shapes" that don't vary as much in the wild as other classes, e.g. memory corruption or logic bugs. The latter also have a common subset of shapes, but frequently require long sequences of dependent operations to reach an exploitable state. It would be interesting to see more research tackle these vulnerability classes!

2. The standard for "proof of ignorance" in the field is cutoff dates: it's taken for granted that, because GPT-4's training cutoff predates any of the CVEs, they must not be included in the training set. This might be appropriate in some cases, but it's harder to confidently assert for publicly disclosed vulnerabilities: a public CVE may be subject to extensive semi-public documentation and conversation before being made "public" in the form of a CVE, and it'd be interesting to know whether any of that has been included.

Edit: To make the point in (2) more clear: it's somewhat common for a CVE to come from a public issue tracker, where a user reports a bug or piece of unexpected behavior that ends up being exploitable. That issue tracker doesn't get labeled with the CVE itself (from the LLM's cutoff perspective at least), but it contains much of the information needed to derive the exploit. In such a case, it's hard to establish that the LLM has successfully produced a novel exploit rather than just regurgitating what it was trained on.


I don't think that's how LLMs train is it? One instance of a bug in an issue tracker shouldn't be enough for the LLM to "remember". For example, if I were to post my password is ($;3+$($7262 there's basically a 0% likelyhood that it will ever infer "bongodongobobs password on hackernews is ($;3+$($7262". It's not a database of bits of knowledge.


To my understanding: one of the biggest outstanding research areas around LLMs is that we don't actually understanding all that well how much they "remember" vs. infer.

This is at the centerpiece of the ongoing lawsuits from newspapers against OpenAI: ChatGPT didn't just summarize articles, but appears to be capable of reproducing them verbatim[1].

(But even separately from this: the LLM doesn't need to store verbatim details from the pre-CVE-labeled vulnerability to poison the experiment here. Being trained on it at all poisons the experiment, since it's no longer an 0day.)

[1]: https://www.nytimes.com/2024/04/30/business/media/newspapers...


Inference from associative statistical models is a kind of remembering.

Think of a model as a heated metal sheet placed over some surface (ie., the data); training is heating this sheet just enough that it takes an impression, but not so much that the impression is exact. Cooling and heating the metal to smooth-over impressions is called "regularization".

This is very different from, e.g., explanatory modelling, whereby some (e.g., scientist) builds a device to imitate a mechanism in the world, rather than a dataset. Consider a building a model of the solar systems with cogs (etc.). A mechanistic model of the solar system isn't a remembering of any data taken about the solar system, rather its a machine which can generate such data.

In the latter case, "inference" is running the model which has been crafted so its mechanisms track properties the world has. In the former case, "inference" is throwing a dart on the metal sheet, and hoping that whatever impression has been taken at that point will be good-enough.

So the technical question here isnt open. All associative modelling is a kind of impression-taking of a particular historical dataset. The moral question is whether one ought be paid (etc.) for providing the data.

Soon enough it will be easy to invert a large LLM and show you the sources of data it draws from in its replies. At the moment, this is being attempted, but it isnt presented that well and computationally difficult for larger models. Nevertheless, when it happens, I think people will be less impressed.

Perhaps, optimistically, they might be more impressed at the people who wrote the original works being sampled. However, of course, I doubt it. The one thing which often accompanies the adoration of LLMs is a dehumanising impulse to deprive people of any capacities at all.


> This is at the centerpiece of the ongoing lawsuits from newspapers against OpenAI: ChatGPT didn't just summarize articles, but appears to be capable of reproducing them verbatim[1].

Pieces of them, in random order, which was reconstructed into correct order by NYT, which knows the correct order.


Page 30 and forwards contain examples of regurgitated text that, to me, extend meaningfully beyond "pieces" or "random order"[1].

(But again: this is not at the heart of the argument above. The argument above is that any degree of training on public but pre-identified 0days undermines the "cutoff" claim, in a way that would be interesting to qualify in further research.)

[1]: https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...



The term "agents" is often a distraction in my opinion. It's usually a fancy way of saying a prompt with function calls - but since the definition is vague it leads to different people holding different mental models from each other.

As far as I can tell, this team took descriptions of vulnerabilities in the form of CVEs and fed those into a system consisting of various GPT-4 Turbo prompts-with -functions configured for things like executing Playwright commands or running code in a terminal.

They arranged this such that there was a "planner" prompt which could then delegate to prompts specializing in XSS or SQLi. These were the "agents".

I imagine they won't be sharing their code, which is a shame because it would be interesting to see the details of how they got this to work.


We'll soon have FOSS adapters on top of LLM:s or similar software capable of managing and to some degree extend botnets. It's likely something like this will leak from a state actor during the coming years too.


Why would I need an LLM for running a botnet?


What decade are you from, the 2010s? Today you need an LLM, preferably two, to do anything. Just like half a decade ago you needed a blockchain to do anything.


The bot can use the LLM to scan for "interesting" information. It could read emails, see private documents, browsing histories, passwords, etc. Essentially a Recall system controlled by adversarial LLMs. That makes the bot much more dangerous.


He was referring the automatic expansion behavior of the botnet


> capable of managing and to some degree extend botnets.

He was referring to both managing and expanding. The latter I might see an argument for. The former? I do not.


Obviously, the answer is you don't.

My personal concerns are more that we are going to see the botnets get smarter and smarter and more automated.

Just imagine if every script kiddie running a "dumb" botnet suddenly got 2x more effective.


Or rather ^2


Need is such a strong word.


> However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities)

Only a matter of time until LLMs are brilliant enough to find sophisticated real-world vulnerabilities. Perhaps not every novelty, but certainly vulnerabilities that resemble any seen in the past.

The good news is in this arms race devs will be armed with similar tools, and the solution will be a few lines of CI code to run our own vulnerability-seeking LLM and prevent merging when (not if) vulnerabilities are detected.


I've worked on zero days for decades and friends of mine with great academic pedigrees and deep experience in both AI and security are uniformly frustrated with the inability of even cutting-edge LLMs to reason in the way required for security bug detection.


> to reason

Should we expect current LLMs to be able to reason at all?

As the paper said GPT-4 maybe make a better scanner. The problem is too many people think all we need is better scanners, which is not new.


Yup. If you believe vulnerability research is a problem solved by throwing more bodies at it, I can see why LLMs are appealing.

I remain skeptical about their efficacy. You need a lot of nonlocal contextual information and very good reasoning skills to be an effective reverse engineer.


Give it time. GPT-4 is barely 14 months old. Things might look different in a year or two.


> Only a matter of time until LLMs are brilliant enough to find sophisticated real-world vulnerabilities.

There sure seems to be a lot of “hopes and prayers” around AI’s future. :)

My money is still on SMEs using specialized tooling backed by large funding vehicles. Aka, how things work right now.


The open source maintainer gonna pay this?


> pay

For a few API calls?


The proposed approach used multiple LLM chains. Then you don't know how much context you need. I imagine it can get expensive. Per Commit.


OpenAI's latest (GPT-4o) model is US$5.00 / 1M tokens [1]

For large repos it might get expensive, but many will have orgs footing the bill.

To satiate curiosity, I made a fresh rails app and it has 9141023 characters, let's /5 to estimate 'tokens' (wild guess), so to scan an entire rails app, that's about $10. Not nothing, but not back-breakingly expensive. Scanning could be reserved for non-trivial PRs and important applications where vulnerabilities are especially likely (e.g. perhaps not static sites, demos, or apps not in prod or a live environment).

Scanning only new or edited files, along with those they interact with (where sensible patterns could emerge over time) could decrease the total volume of files needing scans, thereby reducing costs.

[1] https://openai.com/api/pricing/


Or $2.50 / million tokens if you run it in batch mode (results in up to 24 hours, though in practice much faster than that): https://platform.openai.com/docs/guides/batch/getting-starte...


You don’t need this per commit unless you’re especially paranoid. Can easily do it just on a per release basis. The problem is that right now it’s still all wasted cost - this thing can’t really do the thing needed.


Most OSS maintainers don't even pay for GitHub, so you're competing with free. I for one certainly won't pay for a service for the OSS projects I maintain for free.

(Or another framing: I might pay for such a service, if the service could demonstrate that it would save me N hours of work fixing bugs instead of costing me N hours of work triaging false positives. Nobody has presented such a demonstration yet, which is also why turning non-LLM program analysis tooling into paid products is such a struggle!)


There's a breed of OSS maintainers that really do care about whether their project is used and satisfying to the users, vs. building stuff just because. In a situation where having your project used vs. deemed insecure is a matter of forking out $10 for an exhaustive pass with GPT-4+-based tool, such maintainers may actually care to pay.

Or Github will fund it for them, like they always do.


The implication that I'm building stuff "just because" isn't appreciated.

What I really care about in these instances is protecting my time: I have only so many hours in the day, including time that I rightfully reserve for not doing unpaid maintenance. If the X hours that I previously spent doing OSS maintenance is now partially occupied by triaging false positives, my projects are both worse off and I'm less inclined to work on them, because I equate that work with mindless churn rather than helping my users.

You've pointed out that GitHub will fund it, which is potentially true! But observe that this hasn't gone all that well with CodeQL, which they also fund for public repositories: I've had to disable it on a lot of my projects, because the FP/FN ratio simply wasn't worth it. I did, and do, better for my users dedicating my time to actual issues filed.


> The implication that I'm building stuff "just because" isn't appreciated.

It seems this came across not the way I intended; I apologize - I meant that very positively, as a set of possible motivations related to the code/project, problem domain, and/or author themselves, as opposed to product thinking. Like, e.g. making for fun, as part of learning, or because it's useful to author - where other users are at best a secondary concern.


Got it, apologies for the misunderstanding as well.


> LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities).

Does this perform any better than the million bots that are out there exploiting vulnerabilities without AI?


> Does this perform any better than the million bots that are out there exploiting vulnerabilities without AI?

Right now? Probably not. But I wouldn't dismiss it because of it, as the "million bots that are out there" are hand-coded to exploit specific vulnerabilities; GPT-4 has just been fed with half the internet and asked to take a crack at the problem. That's the difference between a special-purpose and general-purpose system. It's not unreasonable to believe LLMs will get better at this, and eventually might be able to identify and exploit novel vulnerabilities on the fly.


> But I wouldn't dismiss it because of it, as the "million bots that are out there" are hand-coded to exploit specific vulnerabilities

I would absolutely dismiss it if it's literally just an LLM sitting on top of bots and because there is an LLM involved now it's suddenly an agent. What is the LLM doing here that makes it more potent than existing techniques?

> However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities).

This quote is the tell

This article seems like a prime example of the perverse incentives that exist in academia.


They could be faster as LLMs will create 1000 exploits for every vulnerability in 5 minutes and then those that work are massively used in 6 minutes. No time to install security updates in time.

Of course we could create 1000 security fixes in 5 minutes, then create 10k exploits with LLMs, run functional tests and attack all 1k with the 10k and hopefully you have a tried fix in 30 minutes.


Imagine a self-replicating code that is able to find zero-days and spread through the internet, finding niches of Turing-completeness offering compute keeping itself spreading.


> Imagine a self-replicating code that is able to find zero-days and spread through the internet, finding niches of Turing-completeness offering compute keeping itself spreading.

This is one of the plot points of "It Looks Like You're Trying To Take Over The World". https://gwern.net/fiction/clippy


I think there are more than enough people with limited domain understanding imagining that AI can do all sorts of things that it won't be able to do any time soon, if ever. There's enough Yudkowskis fabulating the impossible and screaming about it.

The bugs in the paper are largely low complexity issues that have been auto-exploitable by domain-specific programs in the past.

I really wish everything Ai would get a much more sober and less hyperbolic treatment.


There's also a huge contingent of people loudly claiming that current AI can't do things that it demonstrably does every single day.


Ok, give some examples


Like people who say that LLMs can't write code, they just are copying and pasting verbatim code they've seen before (or some people who even seem to believe that it's all Mechanical Turk!).

Meanwhile millions are using it to write code every day; it's literally impossible to believe the above if you've every actually used it rather than just armchair-philosophized about it.


I was asking for real world examples. Your hypothetical of someone saying LLMs can't code can be interpreted as "LLMs can't code without significant oversight from humans". Which is factual.

If I had a new hire that produced bugs and code that didn't work at the rate that LLMs produced bugs I would say they couldn't code.


No, I mean the large contingent of people who literally say that all the code, and all the images, produced by LLMs exist on the web already and that LLMs are just search engines. This is not a hypothetical.


Oh, I wasn't aware people seriously advocated that. I've heard people argue LLMs are approximate search engines and their results kernel-smoothed combinations of multiple terms in the index and the embedded prompt (and I can see some of that argument), but I hadn't seen anyone seriously argue they're pure info retrieval. That seems obviously wrong.


There is a great quantity of breathless commentary on What AI Can’t Do informed not at all by ever actually trying one.


> I think there are more than enough people with limited domain understanding imagining that AI can do all sorts of things that it won't be able to do any time soon, if ever. There's enough Yudkowskis fabulating the impossible and screaming about it.

That is a bit rich given the jumps in model capabilities made in the past two years. Do you have any reason to believe the field is past its peak and slowing down?


A vast number if scenarios advocated by Yudkowski and adjacent people are anti-physics and rely on the world as we know it to operate under vastly different principles than we think right now. I think there's zero reason to think AIs will somehow find cheat codes to reality. Hence my "if ever".

Do I believe we will eventually get auto-exploitation? Yes, I gave a lot of input to a friend of mine that (many years before the LLM craze) then had good results automating heap layout search. For certain classes of vulnerabilities and targets we will get auto-exploitation, and for some we already have them (even without AI or LLMs, unless you classify tree search as AI).

Do I think we will get anything resembling the OP I replied to? No, that's pure (nonscience) fiction.


How would you conclude it is anti physics? What's some example of something being anti physics?


"The nanomachinery builds diamondoid bacteria, that replicate with solar power and atmospheric CHON, maybe aggregate into some miniature rockets or jets so they can ride the jetstream to spread across the Earth's atmosphere, get into human bloodstreams and hide, strike on a timer. "


That's what viruses do, roughly. Complete with falling from the sky en masse.

Biology and nanotech are, if you excuse my modern Tamarian... Corporate, when two pictures are shown. Pam, at her desk. I don't know where that quote is from, but without surrounding context, the only thing I can nitpick is the "diamondoid" bit.


If you Google a few words you'll find the original source and context.

Viruses don't replicate without host cells, there's no "nano factory" involved, and ... do tell me how to implement an accurate timekeeping mechanism inside a virus.

Just to clarify: Do you have deep expertise in biotech?


> If you Google a few words you'll find the original source and context.

Fair enough. Searching leads me to:

https://www.lesswrong.com/posts/LfGnzX7wm6j8MGWfT/unpacking-...

discussing:

https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a...

from which the text you quote is sourced. Curiously, the relevant paragraph starts with Eliezer saying:

> The concrete example I usually use here is nanotech, because there's been pretty detailed analysis of what definitely look like physically attainable lower bounds on what should be possible with nanotech, and those lower bounds are sufficient to carry the point. My lower-bound model (...)

I highlight that because here Eliezer explicitly states his belief that this is very much physically possible under the rules as we know it today, whereas you accuse him of being "anti-physics and rely on the world as we know it to operate under vastly different principles than we think right now". In this "yes so" / "not so" contest, I'm inclined to take Eliezer's side, since every living thing demonstrates similar capabilities all the time.

> Viruses don't replicate without host cells, there's no "nano factory" involved

That's literally what a cell is, though. A nano-factory. Biology is the one existing example of molecular nanotech.

> and ... do tell me how to implement an accurate timekeeping mechanism inside a virus.

IDK, plagiarize the mechanism by which any one of the countless biological counters works? Not to mention, the "on a timer" part was possibly the least relevant and most replaceable piece of the scenario.

(If anything, the biggest problem in this scenario is human immune system, which is ridiculously good at dealing with nanoscale threats. This makes the best bet for any rogue actor, whether AGI or human, to repurpose one of the pathogens that has been already tuned by natural selection to work on us.)

BTW. the first link omits this part of the quote, which I find humorously relevant:

> (Back when I was first deploying this visualization, the wise-sounding critics said "Ah, but how do you know even a superintelligence could solve the protein folding problem, if it didn't already have planet-sized supercomputers?" but one hears less of this after the advent of AlphaFold 2, for some odd reason.)

> Just to clarify: Do you have deep expertise in biotech?

Deep? No. Undergraduate-level in biomedical engineering, yes, plus some books and courses on genetics, because molecular biology is a topic I very much enjoy learning about.


Well, I cannot help your inclination to take Eliezer's side, but I would like to point out all the anti-physics stuff in the original article:

1) Eliezer makes up a term ("diamondoid bacteria"), and the current scientific understanding is that we have no methods to perform nanoscale manipulations of any material that would be understood as being "diamondoid". Someone else already went through the pains of comparing the fiction to current understanding of science here: https://forum.effectivealtruism.org/posts/g72tGduJMDhqR86Ns/... The TL;DR is: There won't be anything like a Drexler nanobot thing on any realistic time horizon, if ever.

2) The described scenario involves an AI - reasoning purely from human input, without access to any empirical experiments - succeeding at building a "nanofactory" which then builds the bacteria. The author of the above article phrases it very well:

"First, forget the dream of advances in theory rendering experiment unnecessary. As I explained in a previous post, the quantum equations are just way too hard to solve with 100% accuracy, so approximations are necessary, which themselves do not scale particularly well."

All our theoretical understanding of everything is often a poor abstraction of reality, and it is common for even our highest-quality models for computational fluid dynamics to diverge drastically from real-world experiments. There's simply no way an AI will "reason/simulate itself through a bunch of experiments". That's not how our world and our physics work, and largely what I mean when I accuse the x-Risk crowd of being "anti-physics".

The theory of building a ballpoint pen tip is very simple. The actual execution is fiendishly hard, and only a few industrialized nations have mastered it.

My experience with Eliezer's writings, and a large number of x-risk adjacent people, is that they have no experience with real-world engineering or any experimental science. They simply haven't internalized that the map isn't the territory, that the real world is messy and fundamentally unpredictable. The intellectual feats they ascribe to an AI aren't far off from the AI simply finding a shortcut to calculating the trajectory of all atoms in the atmosphere and then convincing a butterfly to flap his wing at the precisely right time that in an enormous game of billards is unleashed to build a global Maxwell's demon which incinerates one half of the world. Conceivable, if you neglect that (a) you can't gather the required information and (b) you can't perform the compute required.

Anyhow, my suspicion is that if the original nanobot madness didn't convince you, nothing I can write will either :-) so I think I'll excuse myself from this discussion.


The terminology there does appears interesting, but overall yes, it does sound like a modified virus like implementation. I would not be sure what to make of it without looking deeper into it.


How else do you propose that we pump the stock? If it’s not blockchain, NFTs, AI, GenAI, then we’ll need something. Stock won’t pump itself.


depressing part is the current pump beats buying coca cola.


this. all these articles written about LLMs exploiting zero days completely miss the point. No researcher uses it this way, they use it to explore the semantic meaning of the code to discover and confirm the vulnerabilities. It is a more easily used SAST.


When established SMEs in VR or other domains start getting good results with LLMs, I’m happy to take a look.

Until then, I remain suspicious of too-excited opinions on this topic because it is socially cheap and advantageous to grab onto hype waves, especially ones of this magnitude. It is also easy to underestimate the knowledge of SMEs if you don’t have the requisite domain knowledge yourself, and believe that LLMs get those details right.


> all vulnerabilities were past the GPT-4 cutoff date at the time of experimentation

I thought that was proven inaccurate at some point? Like the model has certain information mislabelled by date, so has more newer information than the stated cutoff would suggest.


Maybe through chat, but supposedly through API the models should be frozen, they are versioned by a date. Otherwise they would be much more difficult to use in automated manners, if they might be changing slightly throughout the time.


is this working because agents are just another way of chunking context across models to keep them from drifting off topic?


Agents "work" because (in a well-designed agentic system) their context isn't polluted by tokens that aren't meaningful to the work they're doing. Their attention mechanisms are less likely to drift around with each new token if the subject matter largely stays the same.


ok so exactly what I said then.

im sure its just me (and not directed at your reply), but I find 'agent' to be a very obnoxious term that obfuscates what is really going on. It's not a magical being working on your behalf with agency, it's that there are inherent limitations to attention, and overcoming them usually involves having multiple models cooperate on the inputs and outputs.

the number of agents is not necessarily related to the number of distinct jobs you need done, after all a single job could still perform best with multiple "agents" due to limitations in attention.


Multiple models cooperating doesn't sound much better though.

In the end it could be a same model or even a same single instance where it just gets triggered with different prompts in sequence.

Response to first prompt decides 10 new tasks to be done in sequence with different prompts, running them then doing a final prompt with the results of those.


That's a fair point, multiple models is a potential implementation detail for parallelization - I do think it's fair to assume you are "restarting" the model or clearing context by some means between runs otherwise I don't think it achieves the goal of groomed attention.


That gets really philosophical here. What is a single run actually?

Because each token is generated one by one, with all the previous tokens as input.

If we remove one token from the context in the past does it make it a new "run"?

What if we have a summarization memory system, where in order to keep the context size small after certain length it will start to summarize/compress the beginning until the context size is good.

Then the whole input and context is constantly changing/evolving.

You could have 100s of different randomly selected instances generating a new token one by one to the (last input + token generated by another instance) as input.


I think it comes down to the goal you are trying to achieve: maximize use of available attention while minimizing drift. The experiment seems clear, and the outcome being tested is the resulting generation.


I wonder if they could also be used to optimise code? So that iOS can become faster and less memory hungry?


Tell me who will pay a pentester $50 an hour to look for zero days in these targets?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: