I constantly see top models (opus 4.5, gemini 3) get a stroke mid task - they will solve the problem correctly in one place, or have a correct solution that needs to be reapplied in context - and then completely miss the mark in another place. "Lack of intelligence" is very much a limiting factor. Gemini especially will get into random reasoning loops - reading thinking traces - it gets unhinged pretty fast.
Not to mention it's super easy to gaslight these models, just asserting something wrong with vaguely plausible explanation and you get no pushback or reasoning validation.
So I know you qualified your post with "for your use case", but personally I would very much like more intelligence from LLMs.
Not to mention it's super easy to gaslight these models, just asserting something wrong with vaguely plausible explanation and you get no pushback or reasoning validation.
So I know you qualified your post with "for your use case", but personally I would very much like more intelligence from LLMs.