Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The example with dashes made me chuckle, because there actually is a standard for setting off a signature --- two dashes with one trailing space. Searching for this works tolerably well in my experience; if it's in a message and not more than X lines from the end, it's a pretty reliable indicator.

Where Mailgun's library looks really useful to me is in parsing HTML mail (ironic that a supposed semantic markup language makes this problem harder to solve, but that's the net for you...)



How is this a standard? I agree that it's fairly common, but I see a lot of different variations just in my own inbox.


It is literally a "proposed standard": http://tools.ietf.org/html/rfc3676#section-4.3


Indeed. Checking for a signature in email is pretty easy[1]: look for eol-dash-dash-space-eol ("\^-- \$"), if found, cut there. If not found, the sender and/or email client can't be trusted to compose proper email -- don't attempt automatic signature stripping, and forward the whole (probably top-posted) mess.

I'm only half-joking.

[1] Because, if "proper" quoting is used and the client can't properly strip signatures -- at least those included signatures will be "\^>\+ -- $", not "\^-- \$". Now assuming a proper email client/user, those sigs should've been stripped anyway... But such an assumption is likely to lead to tears and unhappiness anyway...

It even works for "proper" top-posters (as if there was such a thing) -- because the last reply will come first, followed by a dash-dash-space delimited signature, followed by all the stuff you'd typically strip out in a reply.


my team has parsed over a billion emails in the past 3 years auto-updating our clients' address books.

the "-- " is indeed the standard and most common ever since Usenet in 94, but of course we've built a ton of variation within our algorithms to handle every thing else you might see.

Feel free to check out our infographic on what you'll find in the average professional's email signature: http://www.evercontact.com/blog/infographic-the-anatomy-of-a...


FYI: the infographic shows the delimiter as "--" instead of "-- "


As I mentioned elsewhere, that's more than a small nitpick, because something like:

    XI
    --
    Chapter XI...
Is perfectly fine in text-email -- so adding the space on the end is very useful -- as you'd rarely need to escape "-- " on a line by itself -- not so for just "--".


figure dash, em dash, en dash or horizontal bar?


Hyphen/minus. ASCII 0x2D or EBCDIC 0x60.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: