This looks cherry-picked, for example Claude Opus had a higher score on SWE-Benc... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		enlyth 4 days ago \| parent \| context \| favorite \| on: GPT-5.2 This looks cherry-picked, for example Claude Opus had a higher score on SWE-Bench Verified so they conveniently left it out, also GDPval is literally a benchmark made by OpenAI

tobias2014 3 days ago | [–]

And who believes that the difference between 91.9% and 92.4% is significant in these benchmarks? Clearly these have margins of error that are swept under the rug.

minadotcom 4 days ago | [–]

agreed.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact