Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Epstein's emails reconstructed in a message-style UI (OCR and LLMs) (github.com/toon-noot)
44 points by toon-noot 1 day ago | hide | past | favorite | 8 comments
This project reconstructs the Epstein email records from the recent U.S. House Oversight Committee releases using only public-domain documents (23,124 image files + 2,800 OCR text files).

Most email pages contain only one real message, buried under layers of repeated headers/footers. I wanted to rebuild the conversations without all the surrounding noise.

I used an OCR + vision-LLM pipeline to extract individual messages from the email screenshots, normalize senders/recipients, rebuild timestamps, detect duplicates, and map threads. The output is a structured SQLite database that runs client-side via SQL.js (WebAssembly).

The repository includes the full extraction pipeline, data cleaning scripts, schema, limitations, and implementation notes. The interface is a lightweight PWA that displays the reconstructed messages in a phone-style UI, with links back to every original source image for verification.

Live demo: https://epsteinsphone.org

All source data is from the official public releases; no leaks or private material.

Happy to answer questions about the pipeline, LLM extraction, threading logic, or the PWA implementation.





The convo with Noam Chomsky is interesting. Deepak Chopra one talking about Trump being 'loco' is quiet funny.

Neat data visualization solution!


Thanks!

Android/Firefox. Nothing's happening when I tap the icons on the demo site.

Thanks for the feedback. i'll try to reproduce. I spent more time with the data pipeline then with testing the UI across platforms...

This is really cool, I enjoyed going through them in this form. Thanks

One nit: the message view seems to auto-hyphenate long words on line-breaks to pack in more text, but one of the things that's struck me about Epstein is how utterly incompetent he was with punctuation. Those correctly-inserted hyphens distract from that impression.

brilliant. feel bad asking for something more - but an inline annotation of who these people are would take it over the top.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: