I know that this is a typical test of an LLM's ability to reason, but I wonder h...

throwaway314155 · on Nov 30, 2024

> teaching an LLM how to recognise the type of problem that it's inherently bad at

Solving this is the actual hard part and is either adjacent to or even equivalent to solving the problem of LLM's hallucinating. ChatGPT already includes a Python interpreter tool which can be used if the context indicates its appropriate.

raffraffraff · on Nov 30, 2024

I suppose my question is pointing to another, which is: can one make an LLM that doesn't hallucinate? Isn't that problem inherent to the way that LLMs work? Obviously we can try to clean the data so there isn't any nonsense fed into it, but that'll only get you so far with a probabilistic, stochastic system. As an LLM once told me "Some experts argue that hallucination is an innate limitation of LLMs, akin to confabulation in humans, where false memories are created without the intention to deceive". I'm not sure if I believe that though.

nyrikki · on Nov 30, 2024

For LLM's no, but the explanation is wrong also, it has nothing to do with 'false memories' and has everything to do with how LLMs work.

Here is the paper.

https://arxiv.org/abs/2401.11817

RAG and fine tuning improve domain specificity and may reduce the problem to a level where you don't care, but it will always be there.

Clean data would help reduce the incidents, possibly to a level that is more usable, but also don't remove the problem.

Considering next token prediction as serial runs on multi tape TMs, with the previous output as the input can help.

Especially if you consider Microsoft's 1.53bitnet that requires full precision for training but can reduce weights to just the sign components for inference.

Unfortunately all paths to explain this I have require graduate level complexity theory and/or diff geometry. Or you relive the Brouwer–Hilbert controversy by trying the logic path.