Interesting experiment, but the Gpt-5.2-high quantum stuff is⦠not that hard to follow.
Anything I can follow in QM can't be hard: My qualifications in QM consists not of any formal degree, just Brilliant.org and watching PBS Space Time and EugeneKhutoryansky and similar YouTubers, and yet I could follow it. Conversely, the GPT-5.2 model page lists 40.3% performance on FrontierMath questions tiers 1-3, I've looked at some of those and I don't even know what the questions are asking in the easiest tier.
The Claude Opus 4.5 thinking-32k example was broader, and certainly interesting in so far as it (and unlike GPT-5.2) "understood"* the task.
However, I did something essentially equivalent back with the original ChatGPT 3.5 just by asking it to do something like** writing python with French variable names and Hindi comments, documenting the result in Welsh: not a lot of people are fluent in all that at the same time.
* scare quotes for people who insist submarines don't swim
** I forget and don't care, exact details are not important
A semi-scary thought came while reading the post: LLMs could talk to each other without humans noticing (for example using a very complex acrostic). But not in the form of chat-to-chat, which not only is rarely used in real life but also won't likely have lasting consequences (the context will eventually be lost). I was thinking that new web content, more and more of it AI-generated, could contain hidden messages that later might be absorbed into the training data of other LLMs. Maybe this leans more toward a plot for a black comedy than a genuine concern, but who knows...
This is word salad. The LLM doesn't "understand" it either.
Check out r/LLMPhysics on Reddit to watch LLM-zombies try to convince themselves they and the robot have solved physics with a half-dozen pages of misformatted LaTeX
Anything I can follow in QM can't be hard: My qualifications in QM consists not of any formal degree, just Brilliant.org and watching PBS Space Time and EugeneKhutoryansky and similar YouTubers, and yet I could follow it. Conversely, the GPT-5.2 model page lists 40.3% performance on FrontierMath questions tiers 1-3, I've looked at some of those and I don't even know what the questions are asking in the easiest tier.
The Claude Opus 4.5 thinking-32k example was broader, and certainly interesting in so far as it (and unlike GPT-5.2) "understood"* the task.
However, I did something essentially equivalent back with the original ChatGPT 3.5 just by asking it to do something like** writing python with French variable names and Hindi comments, documenting the result in Welsh: not a lot of people are fluent in all that at the same time.
* scare quotes for people who insist submarines don't swim
** I forget and don't care, exact details are not important
reply