Makes sense. It just set off some statistical alarm bells in my head to see a mo...

npinsker · 2025-05-21T06:21:11 1747808471

The current metric is actually quite strong -- it mirrors the real-world use case of people trying a few times and being satisfied if any of them's what they're looking for. It rewards diversity of responses.

Actually, search engines do this this too: Google something with many possible meanings -- like "egg" -- on Google, and you'll get a set of intentionally diversified results. I get Wikipedia; then a restaurant; then YouTube cooking videos; Big Green Egg's homepage; news stories about egg shortages. Each individual link is very unlike the others to maximize the chance that one of them's the one you want.

Taek · 2025-05-21T06:06:18 1747807578

Its made a little bit better by the fact that there's something like a dozen different prompts. Across all of the prompts each model had a fair number of opportunities to show off.