I really hate this reductive, facile, "um akshually" take. If the text that the text-generating tool generates contains reasoning, then the text generation tool can be said to be reasoning, can't it.
That's like saying "humans aren't supposed to reason, they're supposed to make sounds with their mouths".
At some point if you need to generate better text you need to start creating a model of how the world works along with some amount of reasoning. The "it's just a token generator" argument fails to get this part. That being said I don't think just scaling LLMs are going to get us AGI but I don't have any real arguments to support that
I’m not a fan of the talking parrot argument, especially when you’re pointing it at models of scale.
The only thing separating a talking parrot and humans is our accuracy in shaping our words to the context in which they’re spoken.
Sure it’s easy to liken a low resource model to a talking parrot, the output seems no better than selective repetition of training data. But is that really so different from a baby whose first words are mimics from the environment around them?
I would argue that as we learn language we implicitly develop the neural circuitry to continue to improve our lexical outputs, this circuitry being concepts like foresight, reasoning, emotion, logic, etc and that while we can take explicit action to teach these ideas, they naturally develop in isolation as well.
I don’t think language models, especially at scale, are much different. They would seem to similarly acquire implicit circuitry like we do as they are exposed to more data. As I see it, the main difference in what exactly that circuitry accomplishes and looks like in final output has more to do with the limited styles of data we can provide and the limitations of fine tuning we can apply on top.
Humans would seem to share a lot in common with talking parrots, we just have a lot more capable hardware to select what we repeat.
The talking parrot can only answer by repeating something it heard before.
Another question you could ask is “What’s the difference between a conversation between 2 people and a conversation between 2 parrots who can answer any question?”
I feel the use of the word "parrot" is unintentionally apt, given that parrots were long thought to be mere mimics but were ultimately shown to have (at least the capacity for) real linguistic understanding.
LLMs fail at so many reasoning tasks (not unlike humans to be fair) that they are either incapable or really poor at reasoning. As far as reasoning machines go, I suspect LLMs will be a dead end.
Reasoning here meaning, for example, given a certain situation or issue described being able to answer questions about implications, applications, and outcome of such a situation. In my experience things quickly degenerate into technobabble for non-trivial issues (also not unlike humans).
If you're contending that LLMs are incapable of reasoning, you're saying that there's no reasoning task that an LLM can do. Is that what you're saying? Because I can easily find an example to prove you wrong.
It could be that all reasoning displayed is showing existing information - so there would be no reasoning, but that aside, what I meant is being able to reason in any consistent way. Like a machine that only sometimes gets an addition right isn't really capable of addition.
The former is easy to test, just make up your own puzzles and see if it can solve them.
"Incapable of reasoning" doesn't mean "only solves some logic puzzles". Hell, GPT-4 is better at reasoning than a large number of people. Would you say that a good percentage of humans are poor at reasoning too?
Not just logic puzzles but also applying information, and, yes, I tried a few things.
People/humans tend to be pretty poor, too (training can help, though), as it isn't easy to really think through and solve things - we don't have a general recipe to follow there and neither do LLMs it seems (otherwise it shouldn't fail).
What I am getting at is that as far as a reasoning machine is concerned, I'd want it to be like a pocket calculator is for arithmetic, i.e., it doesn't fail other than in some rare exceptions - and not inheriting human weaknesses there.
That's like saying "humans aren't supposed to reason, they're supposed to make sounds with their mouths".