I work for a (non profit) journal publisher and we do indeed cut off robot downl...

dllthomas · on April 4, 2014

Is there reason to block instead of throttling?

specialp · on April 4, 2014

We do have a CAPTCHA too before the block. Basically to get the block you have to really work hard at it. We also do not mind limited robot use for cases like downloading all papers given a search term or author but we do not want people downloading our entire corpus either. So throttling is not an option.

I think the case mentioned in the article is definitely heavy a heavy handed approach. When it comes down to it at my place we are just trying to block the wget -r's of the world.

zAy0LfpBZLC8mAC · on April 5, 2014

Is there any reason why you don't want people downloading your entire corpus?

ceph_ · on April 4, 2014

Or captcha?