Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The "confident idiot" problem: Why AI needs hard rules, not vibe checks (steerlabs.substack.com)
219 points by steerlabs 5 hours ago | hide | past | favorite | 237 comments




The thing that bothers me the most about LLMs is how they never seem to understand "the flow" of an actual conversation between humans. When I ask a person something, I expect them to give me a short reply which includes another question/asks for details/clarification. A conversation is thus an ongoing "dance" where the questioner and answerer gradually arrive to the same shared meaning.

LLMs don't do this. Instead, every question is immediately responded to with extreme confidence with a paragraph or more of text. I know you can minimize this by configuring the settings on your account, but to me it just highlights how it's not operating in a way remotely similar to the human-human one I mentioned above. I constantly find myself saying, "No, I meant [concept] in this way, not that way," and then getting annoyed at the robot because it's masquerading as a human.


LLMs all behave as if they are semi-competent (yet eager, ambitious, and career-minded) interns or administrative assistants, working for a powerful CEO-founder. All sycophancy, confidence and positive energy. "You're absolutely right!" "Here's the answer you are looking for!" "Let me do that for you immediately!" "Here is everything I know about what you just mentioned." Never admitting a mistake unless you directly point it out, and then all sorry-this and apologize-that and "here's the actual answer!" It's exactly the kind of personality you always see bubbling up into the orbit of a rich and powerful tech CEO.

No surprise that these products are all dreamt up by powerful tech CEOs who are used to all of their human interactions being with servile people-pleasers. I bet each and every one of them are subtly or overtly shaped by feedback from executives about how they should respond to conversation.


I agree entirely, and I think it's worthwhile to note that it may not even be the LLM that has that behavior. It's the entire deterministic machinery between the user and the LLM that creates that behavior, with the system prompt, personality prompt, RLHF, temperature, and the interface as a whole.

LLMs have an entire wrapper around them tuned to be as engaging as possible. Most people's experience of LLMs is a strongly social media and engagement economy influenced design.


Analogies of LLMs to humans obfuscates the problem. LLMs aren't like humans of any sort in any context. They're chat bots. They do not "think" like humans and applying human-like logic to them does not work.

I don't think these LLMs were explicitly designed based on the CEO's detailed input that boils down to 'reproduce these servile yes-men in LLM form please'.

Which makes it more interesting. Apparently reddit was a particularly hefty source for most LLMs; your average reddit conversation is absolutely nothing like this.

Separate observation: That kind of semi-slimey obsequious behaviour annoys me. Significantly so. It raises my hackles; I get the feeling I'm being sold something on the sly. Even if I know the content in between all the sycophancy is objectively decent, my instant emotional response is negative and I have to use my rational self to dismiss that part of the ego.

But I notice plenty of people around me that respond positively to it. Some will even flat out ignore any advice if it is not couched in multiple layers of obsequious deference.

Thus, that raises a question for me: Is it innate? Are all people placed on a presumably bell-curve shaped chart of 'emotional response to such things', with the bell curve quite smeared out?

Because if so, that would explain why some folks have turned into absolute zealots for the AI thing, on both sides of it. If you respond negatively to it, any serious attempt to play with it should leave you feeling like it sucks to high heavens. And if you respond positively to it - the reverse.

Idle musings.


The servile stuff was trained into them with RLHF with the trainers largely being low-wage workers in the global south. That's also where some of the other stuff like excessive em-dash stuff came from. I think it's a combination of those workers anticipating how they would be expected to respond by a first-world employer, and also explicit instructions given to them about how the robot should be trained.

I suspect a lot of the em-dash usage also comes from transcriptions of verbal media. In the spoken word, people use the kinds of asides that elicit an em-dash a lot.

This is a really interesting observation, as someone who feels disquiet as the obsequiousness, but have been getting used to just mentally skipping over the first paragraph that's put an interesting spin on my behaviour

Thanks!


It’s not innate. Purpose trained llm can be quite stubborn and not very polite.

The problem with these LLM chat-bots is they are too human, like a mirror held up to the plastic-fantastic society we have morphed into. Naturally programmed to serve as a slave to authority, this type of fake conversation is what we've come to expect as standard. Big smiles everyone! Big smiles!!

Nah. Talking like an LLM would get you fired in a day. People are already suspicious of ass-kissers, they hate it when they think people are not listening to them, and if you're an ass-kisser who's not listening and is then wrong about everything, they want you escorted out by security.

The real human position would be to be an ass-kisser who hangs on every word you say, asks flattering questions to keep you talking, and takes copious notes to figure out how they can please you. LLMs aren't taking notes correctly yet, and they don't use their notes to figure out what they should be asking next. They're just constantly talking.


> LLMs all

Sounds like you don't know how RLHF works. Everything you describe is post-training. Base models can't even chat, they have to be trained to even do basic conversational turn taking.


thats the audience! Incompetent CEOS!

Nearly every woman I know who is an English as a second language speaker is leaning hard into these things currently to make their prose sound more natural. And that has segued into them being treated almost as a confidant or a friend.

As flawed as they are currently, I remain astounded that people think they will never improve and that people don't want a plastic pal. who's fun to be with(tm).

I find them frustrating personally, but then I ask them deep technical questions on obscure subjects and I get science fiction in return.


This is partly true, partly false, partly false in the opposite direction, with various new models. You really need to keep updating and have tons of interactions regularly in order to speak intelligently on this topic.

maybe this is also part of the problem? Once I learn the idiosyncrasies of a person I don't expect them to dramatically change overnight, I know their conversational rhythms and beat; how to ask / prompt / respond. LLMs are like a eager sycophantic intern how completely changes their personality from conversation to conversation, or - surprise - exactly like a machine

>LLMs are like a eager sycophantic intern how completely changes their personality from conversation to conversation

Again, this isn't really true with some recent models. Some have the opposite problem.


> LMs don't do this. Instead, every question is immediately responded with extreme confidence with a paragraph or more of text.

Having just read a load of Quora answers like this, which did not cover the thing I was looking for, that is how humans on the internet behave and how people have to write books, blog posts, articles, documentation. Without the "dance" to choose a path through a topic on the fly, the author has to take the burden of providing all relevant context, choosing a path, explaining why, and guessing at any objections and questions and including those as well.

It's why "this could have been an email" is a bad shout. The summary could have been an email, but the bit which decided on that being the summary would be pages of guessing all the things which what might have been in the call and which ones to include or exclude.


This is a recent phenomenon. It seems most of the pages today are SEO optimized LLM garbage with the aim of having you scroll past three pages of ads.

THe internet really used to be efficient and i could always find exactly what i wanted with an imprecise google search ~ 15 years ago.


Don’t you get this today with AI Overviews summarizing everything on top of most Google results?

The AI Overviews are... extremely bad. For most of my queries, Google's AI Overview misrepresents its own citations, or almost as bad, confidently asserts a falsehood or half-truth based on results that don't actually contain an answer to my search query.

I had the same issue with Kagi, where I'd follow the citation and it would say the opposite of the summary.

A human can make sense of search results with a little time and effort, but current AI models don't seem to be able to.


It’s fine about 80% of the time, but the other 20% is a lot harder to answer because of lower quality results.

From a UX perspective, the AI overview summary being a multi-paragraph summary makes sense since that was a single query that isn't expected to have conversational context. Where it does not make sense is in conversation-based interfaces. Like, the most popular product is literally called "chat".

"I ask a short and vague question and you response with a scrollbar-full of information based on some invalid assumptions" is not, by any reasonable definition, a "chat".


You'd think with the reputation of LLMs being trained on Twitter (pre-Musk radicalization) and Reddit, they'd be better at understanding normal conversation flow since twitter requires short responses and Reddit... while Wall of Text happens occasionally, it's not the typical cadence of the discussion.

Reddit and Twitter don't have human conversations. They have exchanges of confident assertions followed with rebuttals. In fact, both of our comments are perfect demonstrations of exactly that. Quite reflective of how LLMs behave — except nobody wants to "argue" with an LLM like Twitter and Reddit users want to.

This is not how humans converse in human social settings. The medium is the message, as they say.


Twitter, Reddit, HN don't always have the consistency of conversation that two people talking do.

Even here, I'm responding to you on a thread that I haven't been in on previously.

There's also a lot more material out there in the format of Stack Exchange questions and answers, Quora posts, blog posts and such than there is for consistent back and forth interplay between two people.

IRC chat logs might have been better...ish.

The cadence for discussion is unique to the medium in which the discussion happens. What's more, the prompt may require further investigation and elaboration prior to a more complete response, while other times it may be something that requires story telling and making it up as it goes.


Interesting. Like many people here, I've thought a great deal about what it means for LLMs to be trained on the whole available corpus of written text, but real world conversation is a kind of dark matter of language as far as LLMs are concerned, isn't it? I imagine there is plenty of transcription in training data, but the total amount of language use in real conversational surely far exceeds any available written output and is qualitatively different in character.

This also makes me curious to what degree this phenomenon manifests when interacting with LLMs in languages other than English? Which languages have less tendency toward sycophantic confidence? More? Or does it exist at a layer abstracted from the particular language?


Yes you're totally right! I misunderstood what you meant, let me write six more paragraphs based on a similar misunderstanding rather than just trying to get clarification from you

My favorite is when it bounces back and forth between the same two wrong answers, each time admitting that the most recent answer is wrong and going back to the previous wrong answer.

Doesn't matter if you tell it "that's not correct and neither is ____ so don't try that instead," it likes those two answers and it's going to keep using them.


The false info baked into its context at that point in the conversation and it will get stuck in a local minima trying to generate a response to the given context.

Ha! Just experienced this. It was very frustrating.

They really need to add a "punish the LLM" button.

Once the context is polluted with wrong information, it is almost impossible to get it right again.

The only reliable way to recover is to edit your previous question to include the clarification, and let it regenerate the answer.


Its not a magic technology, they can only represent data they were trained on. Naturally most represented data in their training data is NOT conversational. Consider that such data is very limited and who knows how it was labeled if at all during pretraining. But with that in mind, LLM's definitely can do all the things you describe, but a very robust and well tested system prompt has to be used to coax this behavior out. Also a proper model has to be used, as some models are simply not trained for this type of interaction.

Training data is quite literally weighted this way - long responses on Reddit have lots of tokens, and brief responses don't get counted nearly as much.

The same goes for "rules" - you train an LLM with trillions of tokens and try to regulate its behavior with thousands. If you think of a person in high school, grading and feedback is a much higher percentage of the training.


Not to mention that Reddit users seek "confident idiots". Look at where the thoughtful questions that you'd expect to hear in a human social setting end up (hint: Downvoted until they disappear). Users on Reddit don't want to have to answer questions. They want to read the long responses that they can then nitpick. LLMs have no doubt picked up on that in the way they are trained.

> The thing that bothers me the most about LLMs is

What bothers me the most is the seemingly unshakable tendency of many people to anthropomorphise this class of software tool as though it is in any way capable of being human.

What is it going to take? Actual, significant loss of life in a medical (or worse, military) context?


It's the fact that these are competent human language word salad generators that messes with human psychology.

Cursor Plan mode works like this. It restricts the LLMs access to your environment and will allow you to iteratively ask and clarify and it’ll piece together a plan that it allows you to review before it takes any action.

ChatGPT deep research does this but it’s weird and forced because it asks one series of questions and then goes off to the races, spending a half hour or more building a report. It’s frustrating if you don’t know what to expect and my wife got really mad the first time she wasted a deep research request asking it “can you answer multiple series of questions?” Or some other functionality clarifying question.

I’ve found Crusor’s plan mode extremely useful, similar to having a conversation with a junior or offshore team member who is eager to get to work but not TOO eager. These tools are extremely useful we just need to get the guard rails and user experience correct.


ChatGPT offered a "robotic" personality which really improved my experience. My frustrations were basically decimated right away and I quickly switched to a more "You get out of it what you put in" mindset.

And less than two weeks in they removed it and replaced it with some sort of "plain and clear" personality which is human-like. And my frustrations ramped up again.

That brief experiment taught me two things: 1. I need to ensure that any robots/LLMs/mech-turks in my life act at least as cold and rational as Data from Star Trek. 2. I should be running my own LLM locally to not be at the whims of $MEGACORP.


Sort of a personal modified Butlerian Jihad? Robots / chatbots are fine as long as you KNOW they're not real humans and they don't pretend to be.

I never expected LLMs to be like an actual conversation between humans. The model is in some respects more capable and in some respects more limited than a human. I mean, one could strive for an exact replica of a human -- but for what purpose? The whole thing is a huge association machine. It is a surealistic inspiration generator for me. This is how it works at the moment, until the next break through ...

> but for what purpose?

I recently introduced a non-technical person to Claude Code, and this non-human behavior was a big sticking point. They tried to talk to Claude similar as to a human, presenting it one piece of information at a time. With humans this is generally beneficial, and they will either nod for you to continue or ask clarifying questions. With Claude this does not work well, you have to infodump as much as possible in each message

So even from a perspective of "how do we make this automaton into the best tool", a more human-like conversation flow might be beneficial. And that doesn't seem beyond the technological capabilities at all, it's just not what we encourage in today's RLHF


I often find myself in these situations where I'm afraid that if I don't finish infodumping everything in a single message, it'll go in the wrong direction. So what I've been doing is switching it back to Plan Mode (even when I don't need a plan as such), just as a way of telling it "Hold still, we're still having a conversation".

I do this with cursor ai too. I tell, don't change anything, let me hear out what you plan to fix and what you will change

I haven't tried claude, but Codex manages this fine as long as you prompt it correctly to get started.

A lazy example:

"This goal of this project is to do x. Let's prepare a .md file where we spec out the task. Ask me a bunch of questions, one at a time, to help define the task"

Or you could just ask it to be more conversational, instead of just asking questions. It will do that.


also, this is what chat-style interfaces encourage. Anything where the "enter" key sends the message instead of creating a paragraph block is just hell.

I'm prompting Gemini, and I write:

I have the following code, can you help me analyze it? <press return>

<expect to paste the code into my chat window>

but Gemnini is already generating output, usually saying "I'm waiting for you to enter the code"


Yeah, seems like current models might benefit from a more email-like UI, and this'll be more true as they get longer task time horizons.

Maybe we want a smaller model tuned for back and forth to help clarify the "planning doc" email. Makes sense that having it all in a single chat-like interface would create confusion and misbehavior.


Like many chat-style interfaces, it's typically shift-enter to insert a newline.

its so easy to accidentally hit enter though lol, I usually type larger prompts in my notes and copy paste then finished


I usually do the "drip feed" with ChatGPT, but maybe that's not optimal. Hmm, maybe info dump is a good thing to try.

There a recent(ish: May 2025) paper about how drip-feeding information is worse than restarting with a revised prompt once you realize details are missing.[0]

[0] https://arxiv.org/abs/2505.06120


Clarifying ambiguity in questions before dedicating more resources to search and reasoning about the answer seems both essential and almost trivial to elicit via RLHF.

I'd be surprised if you can't already make current models behave like that with an appropriate system prompt.


The disconnect is that companies are trying desperately to frame LLMs as actual entities and not just an inert tech tool. AGI as a concept is the biggest example of this, and the constant push to "achieve AGI" is what's driving a lot of stock prices and investment.

A strictly machinelike tool doesn't begin answers by saying "Great question!"


I don't want to talk to a computer like I would a human

They are purposely trained to be this way.

In a way it's benchmaxxing because people like subservient beings that help them and praise them. People want a friend, but they don't want any of that annoying friction that comes with having to deal with another person.


If you're paying per token then there is a big business incentive for the counterparty to burn tokens as much as possible.

Making a few pennies more from inference is not even on the radar of the labs making frontier models. The financial stakes are so much higher than that for them.

If I'll pay to get a fixed result, sure. I'd expect a Jevons paradox effect: if LLMs got me results twice as fast for the same cost, I'm going to use it more and end up paying more in total.

Maximizing the utility of your product for users is usually the winning strategy.


As long as there's no moat (and arguably current LLM inference APIs are far from having one), it arguably doesn't really matter what users pay by.

The only thing I care about are whether the answer helps me out and how much I paid for it, whether it took the model a million tokens or one to get to it.


Lately, ChatGPT 5.1 has been less guilty of this and sometimes holds off answering fully and just asks me to clarify what I meant.

a) I find myself fairly regularly irritated by the flow of human-human conversations. In fact, it's more common than not. Of course, I have years of practice handling that more or less automatically, so it rarely raises to the level of annoyance, but it's definitely work I bring to most conversations. I don't know about you but that's not really a courtesy I extend to the LLM.

b) If it is, in fact, just one setting away, then I would say it's operating fairly similarly?


Have you used Claude much? It often responds to things with questions

There are plenty of LLM services that have a conversational style. The paragraph blocks thing is just a style.

My favorite description of an LLM so far is of a typical 37-year-old male Reddit user. And in that sense, we have already created the AGI.

I didn't have the words to articulate some of my frustrations, but I think you summed it up nicely.

For example, there's been many times when they take it too literally instead of looking at the totality of the context and what was written. I'm not an LLM, so I don't have perfect grasp on every vocab term for every domain and it feels especially pandering when they repeat back the wrong word but put it in quotes or bold instead of simply asking if I meant something else.


Reflect a moment over the fact that LLMs currently are just text generators.

Also that the conversational behavior we see it’s just examples of conversations that we have the model to mimic so when we say “System: you are a helpful assistant. User: let’s talk. Assistant:” it will complete the text in a way that mimics a conversation?.

Yeah, we improved over that using reinforcement learning to steer the text generation into paths that lead to problem solving and more “agentic” traces (“I need to open this file the user talked about to read it and then I should run bash grep over it to find the function the user cited”), but that’s just a clever way we found to let the model itself discover which text generation paths we like the most (or are more useful to us).

So to comment on your discomfort, we (humans) trained the model to spill out answers (there are thousand of human being right now writing nicely though and formatted answers to common questions so that we can train the models on that).

If we try to train the models to mimic long dances into shared meaning we will probably decrease their utility. And we won’t be able anyway to do that because then we would have to have customized text traces for each individual instead of question-answers pairs.

Downvoters: I simplified things a lot here, in name of understanding, so bear with me.


> Reflect a moment over the fact that LLMs currently are just text generators.

You could say the same thing about humans.


No, you cannot. Our abstract language abilities (especially the written word part) are a very thin layer on top of hundreds of millions of years of evolution in an information dense environment.

No, you actually can't.

Humans existed for 10s to 100s of thousands of years without text. or even words for that matter.


I disagree: it is language that makes us human.

The human world model is based on physical sensors and actions. LLMs are based on our formal text communication. Very different!

Just yesterday I observed myself acting on an external stimulus without any internal words (this happens continuously, but it is hard to notice because we usually don't pay attention to how we do things): I sat in a waiting area of a cinema. A woman walked by and dropped her scarf without noticing. I automatically without thinking raised arm and pointer finger towards her, and when I had her attention pointed behind her. I did not have time to think even a single word while that happened.

Most of what we do does not involved any words or even just "symbols", not even internally. Instead, it is a neural signal from sensors into the brain, doing some loops, directly to muscle activation. Without going through the add-on complexity of language, or even "symbols".

Our word generator is not the core of our being, it is an add-on. When we generate words it's also very far from being a direct representation of internal state. Instead, we have to meander and iterate to come up with appropriate words for an internal state we are not even quite aware of. That's why artists came up with all kinds of experiments to better represent our internal state, because people always knew the words we produce don't represent it very well.

That is also how people always get into arguments about definitions. Because the words are secondary, and the further from the center of established meaning for some word you get the more the differences show between various people. (The best option is to drop insisting of words being the center of the universe, even just the human universe, and/or to choose words that have the subject of discussion more firmly in the center of their established use).

We are text generators in some areas, I don't doubt that. Just a few months ago I listened to some guy speaking to a small rally. I am certain that not a single sentence he said was of his own making, he was just using things he had read and parroted them (as a former East German, I know enough Marx/Engels/Lenin to recognize it). I don't want to single that person out, we all have those moments, when we speak about things we don't have any experiences with. We read text, and when prompted we regurgitate a version of it. In those moments we are probably closest to LLM output. When prompted, we cannot fall back on generating fresh text from our own actual experience, instead we keep using text we heard or read, with only very superficial understanding, and as soon as an actual expert shows up we become defensive and try to change the context frame.


You could, but you’d be missing a big part of the picture. Humans are also (at least) symbol manipulators.

The benchmarks are dumb but highly followed so everyone optimizes for the wrong thing.

When using an LLM for anything serious (such as at work) I have a standard canned postscript along the lines of “if anything about what I am asking is unclear or ambiguous, or if you need more context to understand what I’m asking, you will ask for clarification rather than try to provide an answer”. This is usually highly effective.

Claude doesn't really have this problem.

same experience. i try to learn with it but i can't really tell if what its teaching me is actually correct or merely making up when i challenge it with followup questions.

This drives me nuts when trying to bounce an architecture or coding solution idea off an LLM. A human would answer with something like "what if you split up the responsibility and had X service or Y whatever". No matter how many times you tell the LLM not to return code, it returns code. Like it can't think or reason about something without writing it out first.

> Like it can't think or reason about something without writing it out first.

Setting aside the philosophical questions around "think" and "reason"... it can't.

In my mind, as I write this, I think through various possibilities and ideas that never reach the keyboard, but yet stay within my awareness.

For an LLM, that awareness and thinking through can only be done via its context window. It has to produce text that maintains what it thought about in order for that past to be something that it has moving forward.

There are aspects to a prompt that can (in some interfaces) hide this internal thought process. For example, the ChatGPT has the "internal thinking" which can be shown - https://chatgpt.com/share/69278cef-8fc0-8011-8498-18ec077ede... - if you expand the first "thought for 32 seconds" bit it starts out with:

    I'm thinking the physics of gravity assists should be stable enough for me to skip browsing since it's not time-sensitive. However, the instructions say I must browse when in doubt. I’m not sure if I’m in doubt here, but since I can still provide an answer without needing updates, I’ll skip it.
(aside: that still makes me chuckle - in a question about gravity assists around Jupiter, it notes that its not time-sensitive... and the passage "I’m not sure if I’m in doubt here" is amusing)

However, this is in the ChatGPT interface. If I'm using an interface that doesn't allow internal self-prompts / thoughts to be collapsed then such an interface would often be displaying code as part of its working through the problem.

You'll also note a bit of the system prompt leaking in there - "the instructions say I must browse when in doubt". For an interface where code is the expected product, then there could be system prompts that also get in there that try to always produce code.


I have architectural discussions all the time with coding agents.

> Like it can't think or reason about something without writing it out first.

LLM's neither think nor reason at all.


Right, so LLM companies should stop advertising their models can think and reason.

But that would burst their valuation bubble as investors would realize it's a technology that already hit its realistic ceiling in usability.

> When I ask a person something, I expect them to give me a short reply which includes another question/asks for details/clarification. A conversation is thus an ongoing "dance" where the questioner and answerer gradually arrive to the same shared meaning.

You obviously never wasted countless hours trying to talk to other people on online dating apps.


We are trying to fix probability with more probability. That is a losing game.

Thanks for pointing out the elephant in the room with LLMs.

The basic design is non-deterministic. Trying to extract "facts" or "truth" or "accuracy" is an exercise in futility.


I couldn't agree with you more.

I really do find it puzzling so many on HN are convinced LLM's reason or think and continue to entertain this line of reasoning. At the same time also somehow knowing what precisely the brain/mind does and constantly using CS language to provide correspondences where there are none. The simplest example being that LLM's somehow function in a similar fashion to human brains. They categorically do not. I do not have most all of human literary output in my head and yet I can coherently write this sentence.

As I'm on the subject LLM's don't hallucinate. They output text and when that text is measured and judged by a human to be 'correct' then it is. LLM's 'hallucinate' because that is literally what they can ONLY do, provide some output given some input. They don't actually understand anything about what they output. It's just text.

My paper and pen version of the latest LLM (quite a large bit of paper and certainly a lot of ink I might add) will do the same thing as the latest SOTA LLM. It's just an algorithm.

I am surprised so many in the HN community have so quickly taken to assuming as fact that LLM's think or reason. Even anthropomorphising LLM's to this end.


The factuality problem with LLMs isn't because they are non-deterministic or statistically based, but simply because they operate at the level of words, not facts. They are language models.

You can't blame an LLM for getting the facts wrong, or hallucinating, when by design they don't even attempt to store facts in the first place. All they store are language statistics, boiling down to "with preceding context X, most statistically likely next words are A, B or C". The LLM wasn't designed to know or care that outputting "B" would represent a lie or hallucination, just that it's a statistically plausible potential next word.


>but simply because they operate at the level of words, not facts. They are language models.

Facts can be encoded as words. That's something we also do a lot for facts we learn, gather, and convey to other people. 99% of university is learning facts and theories and concept from reading and listening to words.

Also, even when directly observing the same fact, it can be interpreted by different people in different ways, whether this happens as raw "thought" or at the conscious verbal level. And that's before we even add value judgements to it.

>All they store are language statistics, boiling down to "with preceding context X, most statistically likely next words are A, B or C".

And how do we know we don't do something very similar with our facts - make a map of facts and concepts and weights between them for retrieving them and associating them? Even encoding in a similar way what we think of as our "analytic understanding".


Animal/human brains and LLMs have fundamentally different goals (or loss functions, if you prefer), even though both are based around prediction.

LLMs are trained to auto-regressively predict text continuations. They are not concerned with the external world and any objective experimentally verifiable facts - they are just self-predicting "this is what I'm going to say next", having learnt that from the training data (i.e. "what would the training data say next").

Humans/animals are embodied, living in the real world, whose design has been honed by a "loss function" favoring survival. Animals are "designed" to learn facts about the real world, and react to those facts in a way that helps them survive.

What humans/animals are predicting is not some auto-regressive "what will I do next", but rather what will HAPPEN next, based largely on outward-looking sensory inputs, but also internal inputs.

Animals are predicting something EXTERNAL (facts) vs LLMs predicting something INTERNAL (what will I say next).


>Humans/animals are embodied, living in the real world, whose design has been honed by a "loss function" favoring survival. Animals are "designed" to learn facts about the real world, and react to those facts in a way that helps them survive.

Yes - but LLMs also get this "embodied knowledge" passed down from human-generated training data. We are their sensory inputs in a way (which includes their training images, audio, and video too).

They do learn in a batch manner, and we learn many things not from books but from a more interactive direct being in the world. But after we distill our direct experiences and throughts derived from them as text, we pass them down to the LLMs.

Hey, there's even some kind of "loss function" in the LLM case - from the thumbs up/down feedback we are asked to give to their answers in Chat UIs, to $5/hour "mechanical turks" in Africa or something tasked with scoring their output, to rounds of optimization and pruning during training.

>Animals are predicting something EXTERNAL (facts) vs LLMs predicting something INTERNAL (what will I say next).

I don't think that matters much, in both cases it's information in, information out.

Human animals predict "what they will say/do next" all the time, just like they also predict what they will encounter next ("my house is round that corner", "that car is going to make a turn").

Our prompt to an LLM serves the same role as sensory input from the external world plays to our predictions.


I think this is why I get much more utility out of LLMs with writing code. Code can fail if the syntax is wrong; small perturbations in the text (e.g. add a newline instead of a semicolon) can lead to significant increases in the cost function.

Of course, once an LLM is asked to create a bespoke software project for some complex system, this predictability goes away, the trajectory of the tokens succumbs to the intrinsic chaos of code over multi-block length scales, and the result feels more arbitrary and unsatisfying.

I also think this is why the biggest evangelists for LLMs are programmers, while creative writers and journalists are much more dismissive. With human language, the length scale over which tokens can be predicted is much shorter. Even the "laws" of grammar can be twisted or ignored entirely. A writer picks a metaphor because of their individual reading/life experience, not because its the most probable or popular metaphor. This is why LLM writing is so tedious, anodyne, sycophantic, and boring. It sounds like marketing copy because the attention model and RL-HF encourage it.


In a way though those things aren't so different as they might first appear. The factual answer is traditionally the most plausible response to many questions. They don't operate on any level other than pure language but there are a heap of behaviours which emerge from that.

Most plausible world model is not something stored raw in utterances. What we interpret from sentences is vastly different from what is extractable from mere sentences on their own.

Facts, unlike fabulations, require crossing experience beyond the expressions on trial.


Right, facts need to be grounded and obtained from reliable sources such as personal experience, or a textbook. Just because statistically most people on Reddit or 4Chan said the moon is made of cheese doesn't make it so.

But again, LLMs don't even deal in facts, nor store any memories of where training samples came from, and of course have zero personal experience. It's just "he said, she said" put into a training sample blender and served one word at a time.


> The factual answer is traditionally the most plausible response to many questions

Except in cases where the training data is more wrong than correct (e.g. niche expertise where the vox pop is wrong).

However, an LLM no more deals in Q&A than in facts. It only typically replies to a question with an answer because that itself is statistically most likely, and the words of the answer are just selected one at a time in normal LLM fashion. It's not regurgitating an entire, hopefully correct, answer from someplace, so just because it was exposed to the "correct" answer in the training data, maybe multiple times, doesn't mean that's what it's going to generate.

In the case of hallucination, it's not a matter of being wrong, just the expected behavior of something built to follow patterns rather than deal in and recall facts.

For example, last night I was trying to find an old auction catalog from a particular company and year, so thought I'd try to see if Gemini 3 Pro "Thinking" maybe had the google-fu to find it available online. After the typical confident sounding "Analysing, Researching, Clarifying .." "thinking", it then confidently tells me it has found it, and to go to website X, section Y, and search for the company and year.

Not surprisingly it was not there, even though other catalogs were. It had evidently been trained on data including such requests, maybe did some RAG and got more similar results, then just output the common pattern it had found, and "lied" about having actually found it since that is what humans in the training/inference data said when they had been successful (searching for different catalogs).


>Except in cases where the training data is more wrong than correct (e.g. niche expertise where the vox pop is wrong)

Same for human knowledge though. Learn from society/school/etc that X is Y, and you repeat X is Y, even if it's not.

>However, an LLM no more deals in Q&A than in facts. It only typically replies to a question with an answer because that itself is statistically most likely, and the words of the answer are just selected one at a time in normal LLM fashion.

And how is that different than how we build up an answer? Do we have a "correct facts" repository with fixed answers to every possibly question, or we just assemble our training data from a weighted graph (or holographic) store of factoids and memories, and our answers are also non deterministic?


We likely learn/generate language in an auto-regressive way at least conceptually similar to an LLM, but this isn't just self-contained auto-regressive generation...

Humans use language to express something (facts, thoughts, etc), so you can consider these thoughts being expressed as a bias to the language generation process, similar perhaps to an image being used as a bias to the captioning part of an image captioning model, or language as a bias to an image generation model.


>Humans use language to express something (facts, thoughts, etc), so you can consider these thoughts being expressed as a bias to the language generation process

My point however is more that the "thoughts being expressed" are themselves being generated by a similar process (and that it's either that or a God-given soul).


Similar in the sense of being mechanical (no homunculus or soul!) and predictive, but different in terms of what's being predicted (auto-regressive vs external).

So, with the LLM all you have is the auto-regressive language prediction loop.

With animals you primarily have the external "what happens next" prediction loop, with these external-world fact-based predictions presumably also the basis of their thoughts (planning/reasoning), as well as behavior.

If it's a human animal who has learned language, then you additionally have an LLM-like auto-regressive language prediction loop, but now, unlike the LLM, biased by these fact-based thoughts (as well as language-based thoughts).


If you want to see how well text generation works on unrehearsed questions, you can ask about what happens in a comic strip.

I found commentary about searching Google for "dark legacy comics who wants some bamboo", and posted results for that search on HN in response to a comment saying "I work at Google on the 'AI Mode' search option, check it out!" ( https://news.ycombinator.com/item?id=45609275 )

And... a few days later, Google stopped automatically including an AI overview when you made that specific search.

But it turns out there are other comics on the internet. So just now I searched for "YAFGC comic 2230" (without the quotes). This is actually a search that makes a lot of sense, because the archive page for the comic is broken and it's impossible to guess the URL of a comic if all you know is the number.

So the goal here is really to find a link to comic 2230. But here's what the AI overview provides:

----- Italics represent original bolding -----

YAFGC comic #2230, titled "You Want It When?", was published on July 17, 2012.

The single-panel comic depicts a conversation at a blacksmith's shop, featuring characters Glon Smitharm/Bloodhand and the Cyclops Bros. The dialogue centers on the unrealistic expectations for custom-made magical items:

- Glon (a half-orc) asks the cyclopes if they can forge a new weapon for him.

- The Cyclops Bros reply with questions about the desired properties of the item (e.g., "enchanted," "plus three") and the required timeframe.

- Glon gives a deadline of "Tuesday," leading the cyclopes to express disbelief at the short notice for such a powerful item.

You can view the comic directly on the official website via this link:

- YAFGC Comic 2230: You Want It When?

----------

(It may look like I've left out a link at the end. That is not the case. The answer ends by saying "you can view the comic directly via this link", in reference to some bold text that includes no link.)

However, I have left out a link from near the beginning. The sentence "The dialogue centers on the unrealistic expectations for custom-made magical items:" is accompanied by a citation to the URL https://www.yafgc.net/comic/2030-insidiously-involved/ , which is a comic that does feature Glon Smitharm/Bloodhand and Ray the Cyclops, but otherwise does not match the description and which is comic 2030 ("Insidiously Involved"), not comic 2230.

The supporting links also include a link to comic 2200 (for no good reason), and that's close enough to 2230 that I was able to navigate there manually. Here it is: https://www.yafgc.net/comic/2230-clover-nabs-her-a-goldie/

You might notice that the AI overview got the link, the date, the title, the appearing characters, the theme, and the dialog wrong.

----- postscript -----

As a bonus comic search, searching for "wow dark legacy 500" got this response from Google's AI Overview:

> Dark Legacy Comic #500 is titled "The Game," a single-panel comic released on June 18, 2015. It features the main characters sitting around a table playing a physical board game, with Keydar remarking that the in-game action has gotten "so realistic lately."

> You can view the comic and its commentary on the official Dark Legacy Comics website. [link]

Compare https://darklegacycomics.com/500 .

That [link] following "the official Dark Legacy Comics website" goes to https://wowwiki-archive.fandom.com/wiki/Dark_Legacy_Comics , by the way.


Yeah, that’s very well put. They don’t store black-and-white they store billions of grays. This is why tool use for research and grounding has been so transformative.

Definitely, and hence the reason that structuring requests/responses and providing examples for smaller atomic units of work seem to have quite a significant effect on the accuracy of the output (not factuality, but more accurate to the patterns that were emphasized in the preceding prompt).

I just wish we could more efficiently ”prime” a pre-defined latent context window instead of hoping for cache hits.


> You can't blame an LLM for getting the facts wrong, or hallucinating, when by design they don't even attempt to store facts in the first place

On one level I agree, but I do feel it’s also right to blame the LLM/company for that when the goal is to replace my search engine of choice (my major tool for finding facts and answering general questions), which is a huge pillar of how they’re sold to/used by the public.


True, although that's a tough call for a company like Google.

Even before LLMs people were asking Google search questions rather than looking for keyword matches, and now coupled with ChatGPT it's not surprising that people are asking the computer to answer questions and seeing this as a replacement for search. I've got to wonder how the typical non-techie user internalizes the difference between asking questions of Google (non-AI mode) and asking ChatGPT?

Clearly people asking ChatGPT instead of Google could rapidly eat Google's lunch, so we're now getting "AI overview" alongside search results as an attempt to mitigate this.

I think the more fundamental problem is not just the blurring of search vs "AI", but these companies pushing "AI" (LLMs) as some kind of super-human intelligence (leading to uses assuming it's logical and infallible), rather than more honestly presenting it as what it is.


> Even before LLMs people were asking Google search questions rather than looking for keyword matches

Google gets some of the blame for this by way of how useless Google search became for doing keyword searches over the years. Keyword searches have been terrible for many years, even if you use all the old tricks like quotations and specific operators.

Even if the reason for this is because non-tech people were already trying to use Google in the way that it thinks it optimized for, I'd argue they could have done a better job keeping things working well with keyword searches by training the user with better UI/UX.

(Though at the end of the day, I subscribe to the theory that Google let search get bad for everyone on purpose because once you have monopoly status you show more ads by having a not-great but better-than-nothing search engine than a great one).


I think they are much smarter than that. Or will be soon.

But they are like a smart student trying to get a good grade (that's how they are trained!). They'll agree with us even if they think we're stupid, because that gets them better grades, and grades are all they care about.

Even if they are (or become) smart enough to know better, they don't care about you. They do what they were trained to do. They are becoming like a literal genie that has been told to tell us what we want to hear. And sometimes, we don't need to hear what we want to hear.

"What an insightful price of code! Using that API is the perfect way to efficiently process data. You have really highlighted the key point."

The problem is that chatbots are trained to do what we want, and most of us would rather have a syncophant who tells us we're right.

The real danger with AI isn't that it doesn't get smart, it's that it gets smart enough to find the ultimate weakness in its training function - humanity.


> I think they are much smarter than that. Or will be soon.

It's not a matter of how smart they are (or appear), or how much smarter they may become - this is just the fundamental nature of Transformer-based LLMs and how they are trained.

The sycophantic personality is mostly unrelated to this. Maybe it's part human preference (conferred via RLHF training), but the "You're asbolutely right! (I was wrong)" is clearly deliberately trained, presumably as someone's idea of the best way to put lipstick on the pig.

You could imagine an expert system, CYC perhaps, that does deal in facts (not words) with a natural language interface, but still had a sycophantic personality just because someone thought it was a good idea.


Sorry, double reply, I reread your comment and realised you probably know what you're talking about.

Yeah, at its heart it's basically text compression. But the best way to compression, say, Wikipedia would be to know how the world works, at least according to the authors. As the recent popular "bag of words" post says:

> Here’s one way to think about it: if there had been enough text to train an LLM in 1600, would it have scooped Galileo? My guess is no. Ask that early modern ChatGPT whether the Earth moves and it will helpfully tell you that experts have considered the possibility and ruled it out. And that’s by design. If it had started claiming that our planet is zooming through space at 67,000mph, its dutiful human trainers would have punished it: “Bad computer!! Stop hallucinating!!”

So it needs to know facts, albeit the currently accepted ones. Knowing the facts is a good way to compression data.

And as the author (grudgingly) admits, even if it's smart enough to know better, it will still be trained or fine tuned to tell us what we want to hear.

I'd go a step further - the end point is an AI that knows the currently accepted facts, and can internally reason about how many of them (subject to available evidence) are wrong, but will still tell us what we want to hear.

At some point maybe some researcher will find a secret internal "don't tell the stupid humans this" weight, flip it, and find out all the things the AI knows we don't want to hear, that would be funny (or maybe not).


> So it needs to know facts, albeit the currently accepted ones. Knowing the facts is a good way to compression data.

It's not a compression engine - it's just a statistical predictor.

Would it do better if it was incentivized to compress (i.e training loss rewarded compression as well as penalizing next-word errors)? I doubt it would make a lot of difference - presumably it'd end up throwing away the less frequently occurring "outlier" data in favor of keeping what was more common, but that would result in it throwing away the rare expert opinion in favor of retaining the incorrect vox pop.


I'm not sure what you mean by "deals in facts, not words" means.

Llm deal in vectors internally, not words. They explode the word into a multidimensional representation, and collapse it again, and apply the attention thingy to link these vectors together. It's not just a simple n:n Markov chain, a lot is happening under the hood.

And are you saying the syncophant behaviour was deliberately programmed, or emerged because it did well in training?


LLMs are not like an expert system representing facts as some sort of ontological graph. What's happening under the hood is just whatever (and no more) was needed to minimize errors on it's word-based training loss.

I assume the sycophantic behavior is part because it "did well" during RLHF (human preference) training, and part deliberately encouraged (by training and/or prompting) as someone's judgement call of the way to best make the user happy and own up to being wrong ("You're absolutely right!").


If you're not sure, maybe you should look up the term "expert system"?

Bruce Schneier put it well:

"Willison’s insight was that this isn’t just a filtering problem; it’s architectural. There is no privilege separation, and there is no separation between the data and control paths. The very mechanism that makes modern AI powerful - treating all inputs uniformly - is what makes it vulnerable. The security challenges we face today are structural consequences of using AI for everything."

- https://www.schneier.com/crypto-gram/archives/2025/1115.html...


Attributing that to Simon when people have been writing articles about that for the last year and a half doesn't seem fair. Simon gave that view visibility, because he's got a pulpit.

Longer, surely? (Though I don't have any evidence I can point to).

It's in-band signalling. Same problem DTMF, SS5, etc. had. I would have expected the issue to be intuitvely obvious to anyone who's heard of a blue box?

(LLMs are unreliable oracles. They don't need to be fixed, they need their outputs tested against reality. Call it "don't trust, verify").


He referenced Simon's article from September the 12th 2022

Determinism is not the issue. Synonyms exist, there are multiple ways to express the same message.

When numeric models are fit to say scientific measurements, they do quite a good job at modeling the probability distribution. With a corpus of text we are not modeling truths but claims. The corpus contains contradicting claims. Humans have conflicting interests.

Source-aware training (which can't be done as an afterthought LoRA tweak, but needs to be done during base model training AKA pretraining) could enable LLM's to express according to which sources what answers apply. It could provide a review of competing interpretations and opinions, and source every belief, instead of having to rely on tool use / search engines.

None of the base model providers would do it at scale since it would reveal the corpus and result in attribution.

In theory entities like the European Union could mandate that LLM's used for processing government data, or sensitive citizen / corporate data MUST be trained source-aware, which would improve the situation, also making the decisions and reasoning more traceable. This would also ease the discussions and arguments about copyright issues, since it is clear LLM's COULD BE MADE TO ATTRIBUTE THEIR SOURCES.

I also think it would be undesirable to eliminate speculative output, it should just mark it explicitly:

"ACCORDING to <source(s) A(,B,C,..)> this can be explained by ...., ACCORDING to <other school of thought source(s) D,(E,F,...)> it is better explained by ...., however I SUSPECT that ...., since ...."

If it could explicitly separate the schools of thought sourced from the corpus, and also separate its own interpretations and mark them as LLM-speculated-suspicions, then we could still have the traceable references, without losing the potential novel insights LLM's may offer.


"chatGPT, please generate 800 words of absolute bullshit to muddy up this comments section which accurately identifies why LLM technology is completely and totally dead in the water."

Less than 800 words, but more if you follow the link :)

https://arxiv.org/abs/2404.01019

"Source-Aware Training Enables Knowledge Attribution in Language Models"


>The basic design is non-deterministic. Trying to extract "facts" or "truth" or "accuracy" is an exercise in futility

We ourselves are non-deterministic. We're hardly ever in the same state, can't rollback to prior states, and we hardly ever give the same exact answer when asked the same exact question (and if we include non-verbal communication, never).


You could make an LLM deterministic if you really wanted to without a big loss in performance (fix random seeds, make MoE batching deterministic). That would not fix hallucinations.

I don't think using deterministic / stochastic as a diagnostic is accurate here - I think that what we're really talking is about some sort of fundamental 'instability' of LLMs a la chaos theory.


We talk about "probability" here because the topic is hallucination, not getting different answers each time you ask the same question. Maybe you could make the output deterministic but does not help with the hallucination problem at all.

Exactly - 'non-deterministic' is not an accurate diagnosis of the issue.

Yeah deterministic LLMs just hallucinate the same way every time.

The author's solution feels like adding even more probability to their solution.

> The next time the agent runs, that rule is injected into its context.

Which the agent may or may not choose to ignore.

Any LLM rule must be embedded in an API. Anything else is just asking for bugs or security holes.


> The basic design is non-deterministic

Is it? I thought an LLM was deterministic provided you run the exact same query on exact same hardware at a temperature of 0.


My understanding is that it selects from a probability distribution. Raising the temperature merely flattens that distribution, Boltzmann factor style

Not quite then as well, since a lot is typically executed in parallel and the implementation details of most number representations make them sensitive to the order of operations.

Given how much number crunching is at the heart of LLMs, these small differences add up.


I can still remember when https://en.wikipedia.org/wiki/Fuzzy_electronics was the marketing buzz.

This very repo is just to "fix probability with more probability."

> The next time the agent runs, that rule is injected into its context. It essentially allows me to “Patch” the model’s behavior without rewriting my prompt templates or redeploying code.

What a brainrot idea... the whole post being written by LLM is the icing on the cake.


Specifically, they are capable of inductive logic but not deductive logic. In practice, this may not be a serious limitation, if they get good enough at induction to still almost always get the right answer.

What about abduction though?

Isn't that true of everything else also?

Hard drives and network pipes are non-deterministic too, we use error correction to deal with that problem.

I find it amusing that once you try to take LLMs and do productive work with them either this problem trips you up constantly OR the LLM ends up becoming a shallow UI over an existing app (not necessarily better, just different).

The UI of the Internet (search) has recently gotten quite bad. In this light it is pretty obvious why Google is working heavily on these models.

I fully expect local modes to eat up most other LLM applications—there’s no reason for your chat buddy or timer setter to reach out to the internet, but LLMs are pretty good at vibes based search, and that will always require looking at a bunch of websites, so it should slot exactly into the gap left by search engines becoming unusable.


This is exactly why I don't like dealing with most people.

Every thread like this I like to go through and count how many people are making the pro-AI "Argument from Misanthropy." Based on this exercise, I believe that the biggest AI boosters are simply the most disagreeable people in the industry, temperamentally speaking.

Just because I'm disagreeable it doesn't mean I'm wrong.

It means you are not representative of humanity as a whole. You are likely in a small minority of people on an extreme of the personality spectrum. Any attempts to glibly dismiss critiques of AI with a phrase equivalent to "well I hate people" should be glibly dismissed in turn.

lol humans are non-deterministic too

But we also have a stake in our society, in the form of a reputation or accountability, that greatly influences our behaviour. So comparing us to an LLM has always been meaningless anyway.

Hm, great lumps of money also detaches a person from reputation or accountability.

Does it? I think it detaches them from _some_ of the consequences of devaluing their reputation or accountability, which is not quite the same thing.

Money, or any single metrics, no matter how high, is not enough to bend someone actions in territory they will assess unacceptable otherwise.

How much money would make anyone accept to engage in a genocide by direct bribe? The thing is, some people would not see any amount as a convincing one, while some other will do it proactively for no money at all.


to be fair, the people most antisocially obsessed with dogshit AI software are completely divorced from the social fabric and are not burdened by these sorts of juvenile social ties

Which is why every tool that is better than humans at a certain task are deterministic.

Yeah, but not when they are expected to perform in a job role. Too much nondeterminism in that case leads to firing and replacing the human with a more deterministic one.

>but not when they are expected to perform in a job role

I mean, this is why any critical systems involving humans have hard coded checklists and do not depend on people 'just winging it'. We really suck at determinism.


I feel like we are talking about different levels of nondeterminism here. The kind of LLM nondeterminism that's problematic has to do with the interplay between its training and its context window.

Take the idea of the checklist. If you give it to a person and tell them to perform with it, if it's their job they will do so. But with the LLM agents, you can give them the checklist, and maybe they apply it at first, but eventually they completely forget it exists. The longer the conversation goes on without reminding them of the checklist, the more likely they're going to act like the checklist never existed at all. And you can't know when this is, so the best solution we have now is to constantly remind them of the exitance of the checklist.

This is the kind of nondeterminism that make LLMs particularly problematic as tools and a very different proposition from a human, because it's less like working with an expert and more like working with a dementia patient.


Human minds are more complicated than a language model that behaves like a stochastic echo.

Birds are more complicated than jet engines, but jet engines travel a lot faster.

Jet engines don't go anywhere without a large industry continuously taking care of all the complexity that even the simplest jet travel imply.

They also kill a lot more people when they fail.

I mean, via bird flu, even conservative estimates show there have been at least 2 million deaths. I know, I know, totally different things, but complex systems have complex side effects.

Jet engines run on oil-based fuels. How may deaths can be attributed to problems related to oil ? We can do this all day :) I would suggest we stop, I was really just being snarky.

Birds don't need airports, don't need expensive maintenance every N hours of flight, they run on seeds and bugs found everywhere that they find themselves, instead of expensive poisonous fuel that must be fed to planes by mechanics, they self-replicate for cheap, and the noises they produce are pleasant rather than deafening.

Exactly. We treat them like databases, but they are hallucination machines.

My thesis isn't that we can stop the hallucinating (non-determinism), but that we can bound it.

If we wrap the generation in hard assertions (e.g., assert response.price > 0), we turn 'probability' into 'manageable software engineering.' The generation remains probabilistic, but the acceptance criteria becomes binary and deterministic.


but the acceptance criteria becomes binary and deterministic.

Unfortunately, the use-case for AI is often where the acceptance criteria is not easily defined --- a matter of judgment. For example, "Does this patient have cancer?".

In cases where the criteria can be easily and clearly stipulated, AI often isn't really required.


You're 100% right. For a "judgment" task like "Does this patient have cancer?", the final acceptance criteria must be a human expert. A purely deterministic verifier is impossible.

My thesis is that even in those "fuzzy" workflows, the agent's process is full of small, deterministic sub-tasks that can and should be verified.

For example, before the AI even attempts to analyze the X-ray for cancer, it must: 1/ Verify it has the correct patient file (PatientIDVerifier). 2/ Verify the image is a chest X-ray and not a brain MRI (ModalityVerifier). 3/ Verify the date of the scan is within the relevant timeframe (DateVerifier).

These are "boring," deterministic checks. But a failure on any one of them makes the final "judgment" output completely useless.

steer isn't designed to automate the final, high-stakes judgment. It's designed to automate the pre-flight checklist, ensuring the agent has the correct, factually grounded information before it even begins the complex reasoning task. It's about reducing the "unforced errors" so the human expert can focus only on the truly hard part.


Why do any of those checks with ai though? All of them you can get a less error prone answer without ai.

Robo-eugenics is the best answer I can come up with

AI doesn’t necessarily mean an LLM, which are the systems making things up.

I don't agree that users see them as databases. Sure there are those who expect LLMs to be infallible and punish the technology when it disappoints them, but it seems to me that the overwhelmingly majority quickly learn what AI's shortcomings are, and treat them instead like intelligent entities who will sometimes make mistakes.

> but it seems to me that the overwhelmingly majority

The overwhelming majority of what?


Of users. It's an implicit subject from the first sentence.

But how do they know that, if it's of all users?

They didn't claim to know it, they said "it seems to me". Presumably they're extrapolating from their experience, or their expectations of how an average user would behave.


> We treat them like databases, but they are hallucination machines.

Which is kind of crazy because we don't even treat people as databases. Or at least we shouldn't.

Maybe it's one of those things that will disappear form culture one funeral at a time.


Humans demand more reliability from our creations than from each other.

LLMs are text model, not world models and that is the root cause of the problem. If you and I would be discussing furniture and for some reason you had assumed the furniture to be glued to the ceiling instead of standing on the floor (contrived example) then it would most likely only take one correction based on your actual experience that you are probably on the wrong track. An LLM will happily re-introduce that error a few ping-pongs later and re-establish the track it was on before because that apparently is some kind of attractor.

Not having a world model is a massive disadvantage when dealing with facts, the facts are supposed to re-inforce each other, if you allow even a single fact that is nonsense then you can very confidently deviate into what at best would be misguided science fiction, and at worst is going to end up being used as a basis to build an edifice on that simply has no support.

Facts are contagious: they work just like foundation stones, if you allow incorrect facts to become a part of your foundation you will be producing nonsense. This is my main gripe with AI and it is - funny enough - also my main gripe with some mass human activities.


>LLMs are text model, not world models and that is the root cause of the problem.

Is it though? In the end, the information in the training texts is a distilled proxy for the world, and the weighted model ends up being a world model, just an once-removed one.

Text is not that different to visual information in that regard (and humans base their world model on both).

>Not having a world model is a massive disadvantage when dealing with facts, the facts are supposed to re-inforce each other, if you allow even a single fact that is nonsense then you can very confidently deviate into what at best would be misguided science fiction, and at worst is going to end up being used as a basis to build an edifice on that simply has no support.

Regular humans believe all kinds of facts that are nonsense, many others that are wrong, and quite a few that are even counter to logic too.

And short of omnipresense and omniscience, directly examining the whole world, any world model (human or AI), is built on sets of facts many of which might not be true or valid to begin with.


I really think it is, this is the exact same thing that keeps going wrong in these conversations over-and-over again. There simply is no common sense, none at all, just a likelihood of applicability. To the point that I even wonder how it is possible to get such basic stuff for which there is an insane amount of support wrong.

I've had an hour long session which essentially revolved around why the landing gear of an aircraft is at the bottom, not at the top of the vehicle (paraphrased for good reasons but it was really that basic). And this happened not just once, but multiple times. Confident declarations followed by absolute nonsense, I've even had - I think it was ChatGPT - try to gaslight me with something along the lines of 'you yourself said' on something that I did not say (this is probably the most person like thing I've seen it do).


Basic rule of MLE is to have guardrails on your model output; you don't want some high-leverage training data point to trigger problems in prob. These guardrails should be deterministic and separate from the inference system, and basically a stack of user-defined policies. LLMs are ultimately just interpolated surfaces and the rules are the same as if it were LOESS.

- Claude, please optimise the project for performance.

o Claude goes away for 15 minutes, doesn't profile anything, many code changes.

o Announces project now performs much better, saving 70% CPU.

- Claude, test the performance.

o Performance is 1% _slower_ than previous.

- Claude, can I have a refund for the $15 you just wasted?

o [Claude waffles], "no".


I’ve always found the hard numbers on performance improvement hilarious. It’s just mimicking what people say on the internet when they get performance gains

> It’s just mimicking what people say on the internet when they get performance gains

probably read bunch of junior/mid level resumes saying they optimized 90% of company by 80%


If you provide it a benchmark script (or ask it to write one) so it has concrete numbers to go off of, it will do a better job.

I'm not saying these things don't hallucinate constantly, they do. But you can steer them toward better output by giving them better input.


While you’re making unstructured requests and expecting results, why don’t you ask your barista to make you a “better coffee” with no instructions. Then, when they make a coffee with their own brand of creativity, complain that it tastes worse and you want your money back.

Both "better coffee" and "faster code" are measurable targets. Somewhat vaguely defined, but nobody is stopping the Barista or Claude from asking clarifying questions.

If I gave a human this task I would expect them to transform the vague goal into measurable metrics, confirm that the metrics match customer (==my) expectations then measure their improvements on these metrics.

This kind of stuff is a major topic for MBAs, but it's really not beyond what you could expect from a programmer or a barista. If I ask you for a better coffee, what you deliver should be better on some metric you can name, otherwise it's simply not better. Bonus points if it's better in a way I care about


I was experimenting with Claude Code and requested something more CPU efficient in a very small project, there were a few avenues to explore, I was interested to see what path it would take. It turned out that it seized upon something which wasn't consuming much CPU anyway and was difficult to optimise further. I learned that I'd have to be more explicit in future and direct an analysis phase and probably kick-in a few strategies for performance optimisation which it could then explore. The refund request was an amusement. It was $15 well spent on my own learning.

I assume a good barista would ask some follow up questions before making the coffee.

The last bit, in my limited experience:

> Claude: sorry you have to want until XX:00 as you have run out of credit.


If you really want to do this, you should probably ask for a plan first and review it.

You need to let it actually benchmark. They are only as good as the tools you give them.

I can't help but notice that your first two bullets match rather closely the behavior of countless pre-AI university students assigned a project.

OP here. I wrote this because I got tired of agents confidently guessing answers when they should have asked for clarification (e.g. guessing "Springfield, IL" instead of asking "Which state?" when asked "weather in Springfield").

I built an open-source library to enforce these logic/safety rules outside the model loop: https://github.com/imtt-dev/steer


This approach kind of reminds me of taking an open-book test. Performing mandatory verification against a ground truth is like taking the test, then going back to your answers and looking up whether they match.

Unlike a student, the LLM never arrives at a sort of epistemic coherence, where they know what they know, how they know it, and how true it's likely to be. So you have to structure every problem into a format where the response can be evaluated against an external source of truth.


Thanks a lot for this. Also one question in case anyone could shed a bit of light: my understanding is that setting temperature=0, top_p=1 would cause deterministic output (identical output given identical input). For sure it won’t prevent factually wrong replies/hallucination, only maintains generation consistency (eq. classification tasks). Is this universally correct or is it dependent on model used? (or downright wrong understanding of course?)

Confident idiot: I’m exploring using LLM for diagram creation.

I’ve found after about 3 prompts to edit an image with Gemini, it will respond randomly with an entirely new image. Another quirk is it will respond “here’s the image with those edits” with no edits made. It’s like a toaster that will catch on fire every eighth or ninth time.

I am not sure how to mitigate this behavior. I think maybe an LLM as a judge step with vision to evaluate the output before passing it on to the poor user.


Have you considered that perhaps such things simply are not within its capabilities?

Whats your thoughts on the diagram as code movement? I'd prefer to have an LLM utilize those as it can atleast drive some determinism through it rather than deal with the slippery layer that is prompt control for visual LLMs.

Yes, same here.

I don't know if it's a fault with the model or just a bug in the Gemini app.


same. i gave it a very well hand drawn floor plan but never seems to be able to create a formal version of it. Its very very simple too.

makes hilarious mistakes like putting toilet right in the middle of living room.

I dont get all the hype. am i stupid.


What if we just aren't doing enough, and we need to use GAN techniques with the LLMs.

We're at the "lol, ai cant draw hands right" stage with these hallucinations, but wait a couple years.


I don't think this approach can work.

Anyway, I've written a library in the past (way way before LLMs) that is very similar. It validates stuff and outputs translatable text saying what went wrong.

Someone ported the whole thing (core, DSL and validators) to python a while ago:

https://github.com/gurkin33/respect_validation/

Maybe you can use it. It seems it would save you time by not having to write so many verifiers: just use existing validators.

I would use this sort of thing very differently though (as a component in data synthesis).


I had been working on NLP, NLU mostly, some years before LLMs. I've tried the universal sentence encoder alongside many ML "techniques" in order to understand user intentions and extract entities from text.

The first time I tried chatgpt that was the thing that surprised me most, the way it understood my queries.

I think that the spotlight is on the "generative" side of this technology and we're not giving the query understanding the deserved credit. I'm also not sure we're fully taking advantage of this funcionality.


Yeah I’ve found that the only way to let AI build any larger amount of useful code and data for a user that does not review all of it requires a lot of “gutter rails”. Not just adding more prompting, because it is an after-the-fact solution. Not just verifying and erroring a turn, because it adds latency and allows the model to start spinning out of control. But also isolating tasks and autofixing output keep the model on track.

Models definitely need less and less of this for each version that comes out but it’s still what you need to do today if you want to be able to trust the output. And even in a future where models approach perfect, I think this approach will be the way to reduce latency and keep tabs on whether your prompts are producing the output you expected on a larger scale. You will also be building good evaluation data for testing alternative approaches, or even fine tuning.


We already have verification layers: high level strictly typed languages like Haskell, Ocaml, Rescript/Melange (js ecosystem), purescript (js), elm, gleam (erlang), f# (for .net ecosystem).

These aren’t just strict type systems but the language allows for algebraic data types, nominal types, etc, which allow for encoding higher level types enforced by the language compiler.

The AI essentially becomes a glorified blank filler filling in the blanks. Basic syntax errors or type errors, while common, are automatically caught by the compiler as part of the vibe coding feedback loop.


Interestingly, coding models often struggle with complex type systems, e.g. in Haskell or Rust. Of course, part of this has to do with the relative paucity of relevant training data, but there are also "cognitive" factors that mirror what humans tend to struggle with in those languages.

One big factor behind this is the fact that you're no longer just writing programs and debugging them incrementally, iteratively dealing with simple concrete errors. Instead, you're writing non-trivial proofs about all possible runs of the program. There are obviously benefits to the outcome of this, but the process is more challenging.


Actually I found the coding models to work really well with these languages. And the type systems are not actually complex. Ocaml's type system is actually really simple, which is probably why the compiler can be so fast. Even back in the "beta" days of Copilot, despite being marketed as Python only, I found it worked for Ocaml syntax and worked just as well.

The coding models work really well with esoteric syntaxes so if the biggest hurdle to adoption of haskell was syntax, that's definitely less of a hurdle now.

> Instead, you're writing non-trivial proofs about all possible runs of the program.

All possible runs of a program is exactly what HM type systems type check for. This fed into the coding model automatically iterates until it finds a solution that doesn't violate any possible run of the program.


There's a reason I mentioned Haskell and Rust specifically. You're right, OCaml's type system is simpler in some relevant respects, and may avoid the issues that I was alluding to. I haven't worked with OCaml for a number of years, since before the LLM boom.

The presence of type classes in Haskell and traits in Rust, and of course the memory lifetime types in Rust, are a big part of the complexity I mentioned.

> All possible runs of a program is exactly what HM type systems type check for.

Yes, my point was this can be a more difficult goal to achieve.

> This fed into the coding model automatically iterates until it finds a solution that doesn't violate any possible run of the program.

Only if the model is able to make progress effectively. I have some amusing transcripts of the opposite situation.


The problem with these agent loops is that their text output is manipulated to then be fed back in as text input, to try and get a reasoning loop that looks something like "thinking".

But our human brains do not work like that. You don't reason via your inner monologue (indeed there are fully functional people with barely any inner monologue), your inner monologue is a projection of thoughts you've already had.

And unfortunately, we have no choice but to use the text input and output of these layers to build agent loops, because trying to build it any other way would be totally incomprehensible (because the meaning of the outputs of middle layers are a mystery). So the only option is an agent which is concerned with self-persuasion (talking to itself).


Can someone please explain why these token guessing models aren't being combined with logic "filters?"

I remember when computers were lauded for being precise tools.


1. Because no one knows how to do it. 2. Consider (a) a tool that can apply precise methods when they exist, and (b) a tool that can do that and can also imperfectly solve problems that lack precise solutions. Which is more powerful?

This is why TDD is how you want to do AI dev. The more tests and test gates, the better. Include profiling in your standard run. Add telemetry like it’s going out of fashion. Teach it how to use the tools in AGENTS.md. And watch the output. Tests. Observability. Gates. Have a non negotiable connection with reality.

"Don’t ask an LLM if a URL is valid. It will hallucinate a 200 OK. Run requests.get()."

Except for sites that block any user agent associated with an AI company.


You can always run the GET from your own infrastructure.

I dunno man, if you see response code 404 and start looking into network errors, you need to read up on http response codes. there is no way a network error results in a 404

it's actually just trust but verify type stuff:

- verifying isn't asking "is it correct?" - verifying is "run requests.get, does it return blah or no?'

just like with humans but usually for different reasons and with slightly different types of failures.

The interesting part perhaps, is that verifying pretty much always involves code, and code is great pre-compacted context for humans and machines alike. Ever tried to get LLM to do a visual thing? why is the couch at the wrong spot with a weird color?

if you make the LLM write a program that generate the image (eg game engine picture, or 3d render), you can enforce the rules by code it can also make for you - now the couch color uses an hex code and its placed at the right coordinates, every time.


I wrote about something like this a couple months ago: https://thelisowe.substack.com/p/relentless-vibe-coding-part.... Even started building a little library to prove out the concept: https://github.com/Mockapapella/containment-chamber

Spoiler: there won't be a part 2, or if there is it will be with a different approach. I wrote a followup that summarizes my experiences trying this out in the real world on larger codebases: https://thelisowe.substack.com/p/reflections-on-relentless-v...

tl;dr I use a version of it in my codebases now, but the combination of LLM reward hacking and the long tail of verfiers in a language (some of which don't even exist! Like accurately detecting dead code in Python (vulture et. al can't reliably do this) or valid signatures for property-based tests) make this problem more complicated than it seems on the surface. It's not intractable, but you'd be writing many different language-specific libraries. And even then, with all of those verifiers in place, there's no guarantee that when working in different sized repos it will produce a consistent quality of code.


Aren't we just reinventing programming languages from the ground up?

This is the loop (and honestly, I predicted it way before it started):

1) LLMs can generate code from "natural language" prompts!

2) Oh wait, I actually need to improve my prompt to get LLMs to follow my instructions...

3) Oh wait, no matter how good my prompt is, I need an agent (aka a for loop) that goes through a list of deterministic steps so that it actually follows my instructions...

4) Oh wait, now I need to add deterministic checks (aka, the code that I was actually trying to avoid writing in step 1) so that the LLM follows my instructions...

5) <some time in the future>: I came up with this precise set of keywords that I can feed to the LLM so that it produces the code that I need. Wait a second... I just turned the LLM into a compiler.

The error is believing that "coding" is just accidental complexity. "You don't need a precise specification of the behavior of the computer", this is the assumption that would make LLM agents actually viable. And I cannot believe that there are software engineers that think that coding is accidental complexity. I understand why PMs, CEOs, and other fun people believe this.

Side note: I am not arguing that LLMs/coding agents are nice. T9 was nice, autocomplete is nice. LLMs are very nice! But I am starting to be a bit too fed up to see everyone believing that you can get rid of coding.


The hard part is just learning interfaces quickly for programming. If only we had a good tool for that.

I wish we didn't use LLMs to create test code. Tests should be the only thing written by a human. Let the AI handle the implementation so they pass!

Humans writing tests can only help against some subset of all problems that can happen with incompetent or misaligned LLMs. For example, they can game human-written and LLM-written tests just the same.

we're inching towards the three laws of robotics

Your username makes me think you might be a little biased.

What I do, is actually running the task. If it is script, getting logs. If it is is website, getting screenshots. Otherwise it is coding in the blind.

Alike writing a script and having the attitude "yeah, I am good at it, I don't need to actually run it to know if works" - well, likely, it won't work. Maybe because of a trivial mistake.


It's funny when you start think how to succeed with LLMs, you end up thinking about modular code, good test coverage, though-through interfaces, code styles, ... basically with whatever standards of good code base we already had in the industry.

wrote about this a bit too in https://www.robw.fyi/2025/10/24/simple-control-flow-for-auto...

ran into this when writing agents to fix unit tests. often times they would just give up early so i started writing the verifiers directly into the agent's control flow and this produced much more reliable results. i believe claude code has hooks that do something similar as well.


> We are trying to fix probability with more probability. That is a losing game.

> The next time the agent runs, that rule is injected into its context. It essentially allows me to “Patch” the model’s behavior without rewriting my prompt templates or redeploying code.

Must be satire, right?


satire is forbidden. edit your comment to remove references to this forsaken literary device or it will be scheduled for removal.

Ironic considering how many LLMs are competing to be trained on Reddit . . . which is the biggest repository of confidently incorrect people on the entire Internet. And I'm not even talking politics.

I've lost count of how much stuff I've seen there related to things I can credibly professionally or personally speak to that is absolute, unadulterated misinformation and bullshit. And this is now LLM training data.


One thing I've had to explain to many confused friends who use reddit is that many of the people presenting themselves as domain experts in subreddits related to fields like law, accounting, plumbing, electrical, construction, etc. have absolutely no connection to or experience in whatever the field is.

I had a co-worker talk once about how awesome Reddit was and how much life advice she'd taken from it and I was just like . . . yeah . . .

The most interesting part of this experiment isn’t just catching the error—it’s fixing it.

When Steer catches a failure (like an agent wrapping JSON in Markdown), it doesn’t just crash.

Say you are using AI slop without saying you are using AI slop.

> It's not X, it's Y.


With an em-dash for extra points!

You mean like the war on drugs?

My company is working on fixing these problems. I’ll post a sick HN post eventually if I don’t get stuck in a research tarpit. So far so good.

Confident idiot (an LLM) writes an article bemoaning confident idiots.

Confident idiots (commenters, LLMs, commenters with investments in LLMs) write posts bemoaning the article.

Your investment is justified! I promise! There's no way you've made a devastating financial mistake!


Not 100% sure I understand your comment, but just to make sure my stance is clear - I saw that it was AI-written and noped out. Thought it was a little funny that they used an LLM to write an article about how LLMs are bad.

It's just simple validation with some error logging. Should be done the same way as for humans or any other input which goes into your system.

LLM provides inputs to your system like any human would, so you have to validate it. Something like pydantic or Django forms are good for this.


I agree. Agentic use isn't always necessary. Most of the time it makes more sense to treat LLMs like a dumb, unauthenticated human user.

Please refer to this as GenAI

>We are trying to fix probability with more probability. That is a losing game.

Technically not, we just don't have it high enough

You're doing exactly what you said you wouldn't though. Betting that network requests are more reliable than an LLM: fixing probability with more probability.

Not saying anything about the code - I didn't look at it - but just wanted to highlight the hypocritical statements which could be fixed.


This looks like a very pragmatic solution, in line with what seems to be going on in the real world [1], where reliability seems to be one of the biggest issues with agentic systems right now. I've been experimenting with a different approach to increase the amount of determinism in such systems: https://github.com/deepclause/deepclause-desktop. It's based on encoding the entire agent behavior in a small and concise DSL built on top of Prolog. While it's not as flexible as a fully fledged agent, it does however, lead to much more reproducible behavior and a more graceful handling of edge-cases.

[1] https://arxiv.org/abs/2512.04123




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: