Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Code is much much harder to check for errors than an email.

Consider, for example, the following python code:

    x = (5)
vs

    x = (5,)
One is a literal 5, and the other is a single element tuple containing the number 5. But more importantly, both are valid code.

Now imagine trying to spot that one missing comma among the 20kloc of code one so proudly claims AI helped them "write", especially if it's in a cold path. You won't see it.



> Code is much much harder to check for errors than an email.

Disagree.

Even though performing checks on dynamic PLs is much harder than on static ones, PLs are designed to be non-ambiguous. There should be exactly 1 interpretation for any syntactically valid expression. Your example will unambiguously resolve to an error in a standard-conforming Python interpreter.

On the other hand, natural languages are not restricted by ambiguity. That's why something like Poe's law exists. There's simply no way to resolve the ambiguity by just staring at the words themselves, you need additional information to know the author's intent.

In other words, an "English interpreter" cannot exist. Remove the ambiguities, you get "interpreter" and you'll end up with non-ambiguous, Python-COBOL-like languages.

With that said, I agree with your point that blindly accepting 20kloc is certainly not a good idea.


Tell me you've never written any python without telling me you've never written any python...

Those are both syntactically valid lines of code. (it's actually one of python's many warts). They are not ambiguous in any way. one is a number, the other is a tuple. They return something of a completely different type.

My example will unambiguously NOT give an error because they are standard conforming. Which you would have noticed had you actually took 5 seconds to try typing them in the repl.


  > Those are both syntactically valid lines of code. (it's actually one of python's many warts). They are not ambiguous in any way. one is a number, the other is a tuple. They return something of a completely different type.
You just demonstrated how hard it is to "check" an email or text message by missing the point of my reply.

  > "Now imagine trying to spot that one missing comma among the 20kloc of code"
I assume your previous comment tries to bring up Python's dynamic typing & late binding nature and use it as an example of how it can be problematic when someone tries to blindly merge 20kloc LLM-generated Python code.

My reply, "Your example will unambiguously resolve to an error in a standard-conforming Python interpreter." tried to respond to the possibility of such an issue. Even though it's probably not the program behavior you want, Python, being a programming language, will be 100% guaranteed to interpret it unambiguously.

I admit, I should have phrased it a bit more unambiguously than leaving it like that.

Even if it's hard, you can try running a type checker to statically catch such problems. Even if it's not possible in cases of heavy usage of Python's dynamic typing feature, you can just run it and check the behavior at runtime. It might be hard to check, but not impossible.

On the other hand, it's impossible to perform a perfectly consistent "check" on this reply or an email written in a natural language, the person reading it might interpret the message in a completely different way.


In my experience the example you give here is exactly the kind of problem that AI powered code reviews are really good at spotting, and especially amongst codebases with tens of thousands of lines of code in them where a human being might well get scrolling blindness when quickly moving around them to work.


The AI is the one which made the mistake in the first place. Why would you assume it's guaranteed to find it?

The few times I've tried giving LLMs a shot I've had them warning me of not putting some validations in, when that exact validation was exactly 1 line below where they stopped looking.

And even if it did pass an AI code review, that's meaningless anyway. It still needs to be reviewed by an actual human before putting it into production. And that person would still get scrolling blindness whether or not the ai "reviewer" actually detected the error or not.


> The AI is the one which made the mistake in the first place. Why would you assume it's guaranteed to find it?

I didn't say they were guaranteed to find it: I said they were really good at finding these sorts of errors. Not perfect: just really good. I also didn't make any assumption: I said in my experience, by which I mean the code you shared is similar to a portion of the errors that I've seen LLMs find.

Which LLMs have you used for code generation?

I mostly use claude-opus-4-6 at the moment for development, and have had mostly good experiences. This is not to say it never gets anything wrong, but I'm definitely more productive with it than without it. On GitHub I've been using Copilot for more limited tasks as an agent: I find it's decent at code review, but more variable at fixing problems it finds, and so I quite often opt for manual fixes.

And then the other question is, how do you use them? I tend to keep them on quite a short leash, so I don't give them huge tasks, and on those occasions where I am doing something larger or more complex, I tend to write out quite a detailed and prescriptive prompt (which might take 15 minutes to do, but then it'll go and spend 10 minutes to generate code that might have taken me several hours to write "the old way").




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: