Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What's your take on Anthropic's 'Tracing the thoughts of a large language model'? [0]

> To write the second line, the model had to satisfy two constraints at the same time: the need to rhyme (with "grab it"), and the need to make sense (why did he grab the carrot?). Our guess was that Claude was writing word-by-word without much forethought until the end of the line, where it would make sure to pick a word that rhymes. We therefore expected to see a circuit with parallel paths, one for ensuring the final word made sense, and one for ensuring it rhymes.

> Instead, we found that Claude plans ahead. Before starting the second line, it began "thinking" of potential on-topic words that would rhyme with "grab it". Then, with these plans in mind, it writes a line to end with the planned word.

This is an older model (Claude 3.5 Haiku) with no test time compute.

[0]: https://www.anthropic.com/news/tracing-thoughts-language-mod...



What is called "planning" or "thinking" here doesn't seem conceptually much different to me than going from naive breath first search based Dijkstra shortest path search, to adding a heuristics that makes it search in a particular direction first and calling it A*. In both cases you're adding another layer to an existing algorithm in order to make it more effective. Doesn't make either AGI.

I'm really no expert in neural nets or LLMs, so my thinking here is not an expert opinion, but as a CS major reading that blog from Anthropic, I just cannot see how they provided any evidence for "thinking". To me it's pretty aggressive marketing to call this "thinking".


> In both cases you're adding another layer to an existing algorithm in order to make it more effective. Doesn't make either AGI.

Yet. The human mind is a big bag of tricks. If the creators of AI can enumerate a large enough list of capabilities and implement those, then the product can be as good as 90% of humans, but at a fraction of the cost and a billion times the speed - then it doesn't matter if it's AGI or not. It will have economic consequences.


And make them work together. It's not just having a big bag of tricks; it's also knowing which trick to pull out when. (And that may just be pulling out a trick, trying it, and knowing when the results aren't good enough, and so trying a different one.)

The observant will note that the word "knowing" kept appearing in the previous paragraph. Can that knowing also be reduced to LLM-like tricks? Or is it an additional step?


It's sufficient to appear to know. My washing machine "knows" when my clothes are dry.


They definitely do strain the neurology and thinking metaphors in that article. But the Dijkstra's algorithm and A* comparisons are the flipside of that same coin. They aren't trying to make it more effective. And definitely not trying to argue for anything AGI related.

Either way: They're tampering with the inference process, by turning circuits in the LLM on and off, in an attempt to prove that those circuits are related with a specific function. [0]

They noticed that circuits related to a token that is only relevant ~8 tokens forward were already activated on the newline token. Instead of only looking at the sequence of tokens that has been generated so far (aka backwards), and generating the next token based off of that information, the model is activating circuits related to tokens that are not relevant to the next token only, but to specific tokens a handful of tokens after.

So, information related to more than just the next upcoming token (including a reference to just one specific token) is being cached during a newline token. Wouldn't call that thinking, but I don't think calling it planning is misguided. Caching this sort of information in the hidden state would be an emergent feature, rather than a feature that was knowingly aimed at by following a specific training method, unlike with models that do test time compute. (DeepSeek-R1 paper being an example, with a very direct aim at turbocharging test time compute, aka 'reasoning'. [1])

The way they went at defining the function of a circuit, was by using their circuit tracing method, which is open source so you can try it out for yourself. [2] Here's the method in short: [3]

> Our feature visualizations show snippets of samples from public datasets that most strongly activate the feature, as well as examples that activate the feature to varying degrees interpolating between the maximum activation and zero.

> Highlights indicate the strength of the feature’s activation at a given token position. We also show the output tokens that the feature most strongly promotes / inhibits via its direct connections through the unembedding layer (note that this information is typically more meaningful for features in later model layers).

[0]: https://transformer-circuits.pub/2025/attribution-graphs/bio... [1]: https://arxiv.org/pdf/2501.12948 [2]: https://github.com/safety-research/circuit-tracer [3]: https://transformer-circuits.pub/2025/attribution-graphs/met...


Generalize the concept from next token prediction to coming tokens prediction and the rest still applies. LLMs are still incredibly poor at symbolic thought and following multi-step algorithms, and I as a non-ML person don't really see what in the LLM mechanism would provide such power. Or maybe we're still just another 1000x scale off and symbolic thought will emerge at some point.

Me personally, I expect to see LLMs to be a mere part of whatever will be invented later.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: