Show HN: goworker – a faster, Resque-compatible background worker

benmanns · on Sept 16, 2013

This is a weekend project I started on as a way to learn Go that I quickly realized would be something worth sharing. It lets you queue jobs from Rails that are processed in Go, which can speed things up considerably.

I would appreciate all kinds of feedback, and for anyone who wants to try goworker out, if you give me your email at the bottom of the page, I'll send you my contact info for some free one-on-one help getting started.

atonse · on Sept 16, 2013

Very cool - I just interviewed with a company that is using Sidekiq to process up to millions of jobs (at about 2/sec per worker) and I asked them to look into using Go for workers. I can send this to them.

benmanns · on Sept 16, 2013

Awesome! Can you send me your contact info? Mine is in my HN profile.

sylvinus · on Sept 16, 2013

Would be awesome to have RQ compatibility too ;-) http://python-rq.org

benmanns · on Sept 16, 2013

Hmm. I took a look at this and it could be a possibility, but RQ's message format is much different from Resque and Sidekiq's. See https://gist.github.com/benmanns/6583712 for a comparison.

Titanous · on Sept 16, 2013

I have a similar project that we use in production; a Sidekiq clone in Go: https://github.com/cupcake/gokiq

I should really document it. It has full type safety, stats, scheduled jobs, error reporting, retries, etc.

benmanns · on Sept 16, 2013

That's really cool. Where are you using it in production?

Full Sidekiq compatibility (scheduling, retries) was my next goal.

Titanous · on Sept 16, 2013

It is part of the infrastructure for Cupcake's Tent hosting service (https://cupcake.io).

steveklabnik · on Sept 16, 2013

Hello there! Resque maintainer here. This is super neat! I'll make sure to link to goworker in our docs. I have wanted to do something like this, but haven't gotten around to it yet.

Please let me know how I can help you stay abreast of what we're doing with Resque 2.

benmanns · on Sept 16, 2013

Hello! Thanks for your comment and tweet. I've been watching both Resque and Resque 2 while writing goworker. I would definitely like to stay in contact with you.

jemeshsu · on Sept 16, 2013

Interestingly there is a Sidekiq compatible job queue written in Go that has a similar name: go-workers https://github.com/jrallison/go-workers

Wondering how these ruby/rails inspired work queues compare with NSQ.

rargulati · on Sept 17, 2013

We're in the midst of switching from quite a large sidekiq deployment to inbound work being done in NSQ. At a high level, NSQ allows for different strategies like fanout (having same msg processed by different kinds of workers) without having to build them into a lib. This involves outgoing API requests and minimal scraping.

We're keeping our outbound work in sidekiq for now (less throughput; not as necessary to have fast deserialization). It would be interesting to move all the workers to Go (and explore the advantages of the libraries linked here) as well.

jonnii · on Sept 16, 2013

I'm using this and it works great. We even managed to get it deploy to heroku, which was pretty magic.

benmanns · on Sept 16, 2013

How do you handle when Heroku sends a SIGTERM to kill your process? I couldn't find a way to preempt running workers, so everything running on Heroku has to finish within 10 seconds or you can lose it.

jrallison · on Sept 16, 2013

Hey Ben,

Author of go-workers here. On SIGTERM, I stop accepting new work, and wait for all running workers to finish before halting.

If workers take longer than the 10 seconds Heroku gives you, go-workers uses reliable queueing (using http://redis.io/commands/brpoplpush) so the job will run again next time you start up the process.

benmanns · on Sept 16, 2013

Okay, cool. I do the same thing on the polling side, but don't use reliable queueing yet. I think that is probably the best way to handle the failures.

jonnii · on Sept 16, 2013

I'm not sure tbh, we haven't had that problem yet.

AznHisoka · on Sept 16, 2013

I took a look, and am a little confused. Does the code for the workers need to be written in Go, not Ruby?

Have you also tested the performance of this with 40-50 workers doing intensive I/O work such as crawling web pages? One of the main disadvantages of Sidekiq is that the workers just freeze every 15-30 minutes when you have a ton of workers crawling web pages. The only workaround for me is to setup a cron job that restarts the workers if I detect this pattern.

benmanns · on Sept 16, 2013

Yes, workers are written in Go, but can be queued from Ruby. I'll put together an example in the goworker-examples [0] to test web crawling.

[0] https://github.com/benmanns/goworker-examples

dkhenry · on Sept 16, 2013

So workers written in go are 1000x times faster then workers written in Ruby? It almost seemed from the documentation that the workers were written in Ruby as well.

benmanns · on Sept 16, 2013

They are typically 10x faster in "raw" speed and 100x faster for concurrency. However, since Go's goroutines take almost no memory, you can run over 1,000 workers in the same amount of memory as one Ruby worker.

On the queuing side, you use Ruby. For the examples, I was trying to show what the "equivalent" Ruby worker was so you could get an idea of what kind of workload it was. However, I do see that it is confusing.

_pctq · on Sept 17, 2013

That looks great, congrats.

I suppose the main use case is working on data loosely coupled with web app, thus ? For example, if we have to do any big processing on active_record objects, we would have to reimplement all model stack in go ?

skion · on Sept 16, 2013

And so the age starts where we rewrite in Go for performance all cool things Ruby, Python...

agentultra · on Sept 16, 2013

You also forget the competition from Javascript as well. Rewrite all of the things in all of the languages.

I'd much rather see neat algorithms and libraries being re-written in C with an FFI-friendly API. I think it would reduce the number of "Show HN: I rewrote X in Y" posts. The majority of HN users rarely care about Y. Such posts are just attention-grabbing noise.

If it was written in C you could just write bindings to it in Y and nobody but Y developers would need to hear about it. Instead we could read about X on HN. X is interesting.

iamwil · on Sept 16, 2013

Actually, I've wondered if there's a docker for libraries in different languages. The closest I've found are http APIs at the application/service level, or libraries mature enough to be run at the command line, so you can use them in conjunction with other unix commands.

It doesn't seem right that we can't leverage libraries in other languages without significant effort.

agentultra · on Sept 16, 2013

Could the down-voters enlighten me as to what was wrong with my comment/criticism?

benmanns · on Sept 16, 2013

That was my thought. However, with goworker you can keep 9/10 workers in Ruby and only your most used or most resource-intensive in Go.

zellyn · on Sept 16, 2013

Performance, and/or deployabity...

adefa · on Sept 16, 2013

There is Java implementation of Resque as well: https://github.com/gresrun/jesque

I've been using jesque to consume Resque tasks with no problems, and performance is great.

gresrun · on Sept 16, 2013

Author of Jesque here. Jesque has numerous additional features over Resque, including ability to dynamically change the queues that a worker polls and an efficient pooled worker implementation. Also available is a Java version of resque-web [1]

[1] https://github.com/gresrun/jesque-web

benmanns · on Sept 16, 2013

Cool! I'll check Jesque out and add it to the benchmarks.

mikeni · on Sept 24, 2013

interested in your findings

regarding goworker, so the worker logic has to be written in go correct? for my rails app if its something simple like sending email, I can use that, but if it involves something where I need activerecord is it still a good option?

jrochkind1 · on Sept 16, 2013

> For simple dequeuing, deserialization, execution, and updating stats (as with the Hello worker above), goworker is 80x faster than Resque and 23x faster than Sidekiq.

In real world use cases, how often is that kind of overhead a significant portion of total run time, compared with the actual work? If it's a small portion, would that make the speed up fairly irrelevant, does it actually speed up real world use cases significantly?

Of course, your actual work payload may be faster too in go than ruby if you know how to write it well.

benmanns · on Sept 16, 2013

You're right that this is less significant if you are hitting an empty queue (triggering a timeout) between each request.

However, even when not comparing (seconds to computer 10000 jobs) benchmarks, you still get benefits from goworker. Since dequeuing, deserialization, and updating stats is faster, the latency between "job insert" and "job started" is lower, which means faster interface updates.

You can also restructure your workers. Instead of running one worker that loops 5,000 times because the Redis overhead is too high, you can have one job queue 5,000 more which can be run on several distributed workers.

jrochkind1 · on Sept 16, 2013

Your conclusions about latency, and being able to restructure things so tasks are much smaller -- make sense. (Potentially; I'd still want to benchmark my own real world app before assuming!)

(I am confused with your first paragraph and a half, I think you have some typos in there, or I just don't understand what you're saying. But I understand what you're saying about latency and about being able to restructure tasks to be smaller, thanks.)

benmanns · on Sept 16, 2013

Ack, sorry for the confusion.

I mean that if your users are waiting on jobs, they get the result when `encode time` + `enqueue time` + `wait time` + `dequeue time` + `decode time` + `execute time` is completed.

Even though you might not need raw 1000 jobs/second throughput, your users will still benefit by reducing `wait time` + `dequeue time` + `decode time`, which goworker does.

sanderjd · on Sept 16, 2013

I would actually expect the simple dequeue-deserialize-update-something-in-redis use case to be the "worst case" comparison and that the more work the go worker does, the better the performance numbers would look.

It seems like using something designed from the start for cross-technology communication (like 0mq) may be a better idea, but I suppose there is already lots and lots of existing resque client code out there.

steveklabnik · on Sept 16, 2013

Since jobs are just encoded as JSON into Redis, it's pretty already cross-technology compatible. Furthermore, we're committing to a format specifically for Resque 2.0, so you can guarantee that you'll be interoperable.

sanderjd · on Sept 17, 2013

Neat. I really like the idea of Resque being able to be thought of as a protocol specification plus a reference client and server implementation.

benmanns · on Sept 16, 2013

Yes, this is exactly what I was thinking.

You can replace one single Resque worker to see how the performance improves without changing the front-end queuing or having to run an additional queuing server

bradleyland · on Sept 16, 2013

> In real world use cases, how often is that kind of overhead a significant portion of total run time, compared with the actual work?

I wish more people paid attention to questions like this. Be it web servers, app servers, whatever; too many people look at microbenchmarks and decide that a move is imperative when they're really only optimizing a tiny fraction of the total run time.

> Of course, your actual work payload may be faster too in go than ruby if you know how to write it well.

I think that this is the big bet you're going to make when you choose goworker over a Ruby implementation.

What makes this intriguing is that in a web app scenario, background workers are often used for tasks that would take too long to run in the HTTP request/response cycle. This means there is a selection bias toward tasks that may be computationally difficult. That's not to say that all tasks handed off to background workers are CPU bound; many (most, maybe?) are I/O bound, but for the class of problems that are CPU bound, this offers a great solution.

derefr · on Sept 16, 2013

But for the CPU-bound cases, people are usually calling into a library implemented in C either way (this is why Resque uses a forking model instead of green threads, after all.) The only speed-up you'll get is in the glue, because the task itself will probably use the same C library.

bradleyland · on Sept 16, 2013

If you're already doing the majority of your work (total run time) in C code, then of course you're not going to benefit from a Golang back-end, but I'd challenge the "usually" portion of your claim. C wrapped in Ruby is not the same as pure C or Golang.

The tool-selection argument quickly devolves in to the usual arguments between the benefits of languages like Ruby versus lower level languages, but I'm not sure that's the most productive conversation. That's well traveled ground.

What I'm saying is that I'm happy to see this (Golang workers available through a Resque API) as an option. I'm not advocating blindly replacing C implementations (or any other implementation), but I wouldn't be surprised to see a Golang back-end outperform Ruby code, even when that Ruby code is calling in to C libraries. The usual rules apply when making that evaluation. You have to see for yourself.

benmanns · on Sept 16, 2013

Yes! I think you capture my intent with goworker. I like to think of it as a halfway point between pure C or C extensions and pure Ruby.

meritt · on Sept 16, 2013

Yup. I have distributed worker system with all the workers in PHP. The majority of the time spent is waiting on database requests and external HTTP calls. Even if the speed of code was instantaneous it wouldn't make much of a performance increase overall.

benmanns · on Sept 16, 2013

Even though most of the time is spent on I/O, goworker can get wins from its super cheap concurrency. I was able to run 100,000 workers on my single 4GB RAM laptop.

joevandyk · on Sept 16, 2013

How would you write integration tests if you had a rails app that pushed jobs out to goworker?

Say a user clicks a link that creates an async job that does a API call somewhere. What's a good way to test that entire process at once? You can easily test them separately, that clicking a link creates an async job, and that the worker can process the job, but it's useful to test the whole system at once.

benmanns · on Sept 16, 2013

That's a great question that I haven't thought about before. You could possibly write a Go wrapper that executes your workers, getting the job from a command line argument rather than Redis. https://gist.github.com/benmanns/6584142

joevandyk · on Sept 16, 2013

The test should use redis though, to test the whole stack.

Same question goes for any time of queue system, like quque_classic, which stores the jobs in a postgresql table.

kohanz · on Sept 16, 2013

In my side-project, which is my first foray into Rails, I use a Resque background worker to do processing and it takes a significant amount of time. As I've heard more about Go, I've wondered how I could use it to speed up this background process. Here is my answer! Thank you very much, I'm looking forward to trying this out as it fits my needs exactly.

benmanns · on Sept 16, 2013

Awesome! Contact me (info in HN profile), and I'll help you get started.

g5pw · on Sept 16, 2013

A nice log graph would be better than the linear ones IMO, especially when the scale stretches a lot (like in the Database Insert tab)

benmanns · on Sept 16, 2013

True, but I thought that that would look disingenuous, because it would look like the worker times scale logarithmically when they are about as linear as you can get.

hmottestad · on Sept 16, 2013

Please label your graphs. X-axis, Y-axis and title :)

benmanns · on Sept 16, 2013

Ah, yes. I'm using Flot and was having trouble getting good looking labels. I'll work on that right away, but in the meantime: X-axis is jobs, Y-axis is seconds to execute.

dylz · on Sept 16, 2013

Might I recommend highcharts? It's a lot easier to work with than flot IMO

ylem · on Sept 16, 2013

I've been using jqplot recently, which is rather pleasant.

gary4gar · on Sept 16, 2013

You can use Google Visualization API, its really easy to use.