This is a weekend project I started on as a way to learn Go that I quickly realized would be something worth sharing. It lets you queue jobs from Rails that are processed in Go, which can speed things up considerably.
I would appreciate all kinds of feedback, and for anyone who wants to try goworker out, if you give me your email at the bottom of the page, I'll send you my contact info for some free one-on-one help getting started.
Very cool - I just interviewed with a company that is using Sidekiq to process up to millions of jobs (at about 2/sec per worker) and I asked them to look into using Go for workers. I can send this to them.
Hmm. I took a look at this and it could be a possibility, but RQ's message format is much different from Resque and Sidekiq's. See https://gist.github.com/benmanns/6583712 for a comparison.
Hello there! Resque maintainer here. This is super neat! I'll make sure to link to goworker in our docs. I have wanted to do something like this, but haven't gotten around to it yet.
Please let me know how I can help you stay abreast of what we're doing with Resque 2.
Hello! Thanks for your comment and tweet. I've been watching both Resque and Resque 2 while writing goworker. I would definitely like to stay in contact with you.
We're in the midst of switching from quite a large sidekiq deployment to inbound work being done in NSQ. At a high level, NSQ allows for different strategies like fanout (having same msg processed by different kinds of workers) without having to build them into a lib. This involves outgoing API requests and minimal scraping.
We're keeping our outbound work in sidekiq for now (less throughput; not as necessary to have fast deserialization). It would be interesting to move all the workers to Go (and explore the advantages of the libraries linked here) as well.
How do you handle when Heroku sends a SIGTERM to kill your process? I couldn't find a way to preempt running workers, so everything running on Heroku has to finish within 10 seconds or you can lose it.
Author of go-workers here. On SIGTERM, I stop accepting new work, and wait for all running workers to finish before halting.
If workers take longer than the 10 seconds Heroku gives you, go-workers uses reliable queueing (using http://redis.io/commands/brpoplpush) so the job will run again next time you start up the process.
Okay, cool. I do the same thing on the polling side, but don't use reliable queueing yet. I think that is probably the best way to handle the failures.
I took a look, and am a little confused. Does the code for the workers need to be written in Go, not Ruby?
Have you also tested the performance of this with 40-50 workers doing intensive I/O work such as crawling web pages? One of the main disadvantages of Sidekiq is that the workers just freeze every 15-30 minutes when you have a ton of workers crawling web pages. The only workaround for me is to setup a cron job that restarts the workers if I detect this pattern.
So workers written in go are 1000x times faster then workers written in Ruby? It almost seemed from the documentation that the workers were written in Ruby as well.
They are typically 10x faster in "raw" speed and 100x faster for concurrency. However, since Go's goroutines take almost no memory, you can run over 1,000 workers in the same amount of memory as one Ruby worker.
On the queuing side, you use Ruby. For the examples, I was trying to show what the "equivalent" Ruby worker was so you could get an idea of what kind of workload it was. However, I do see that it is confusing.
I suppose the main use case is working on data loosely coupled with web app, thus ? For example, if we have to do any big processing on active_record objects, we would have to reimplement all model stack in go ?
You also forget the competition from Javascript as well. Rewrite all of the things in all of the languages.
I'd much rather see neat algorithms and libraries being re-written in C with an FFI-friendly API. I think it would reduce the number of "Show HN: I rewrote X in Y" posts. The majority of HN users rarely care about Y. Such posts are just attention-grabbing noise.
If it was written in C you could just write bindings to it in Y and nobody but Y developers would need to hear about it. Instead we could read about X on HN. X is interesting.
Actually, I've wondered if there's a docker for libraries in different languages. The closest I've found are http APIs at the application/service level, or libraries mature enough to be run at the command line, so you can use them in conjunction with other unix commands.
It doesn't seem right that we can't leverage libraries in other languages without significant effort.
Author of Jesque here. Jesque has numerous additional features over Resque, including ability to dynamically change the queues that a worker polls and an efficient pooled worker implementation. Also available is a Java version of resque-web [1]
regarding goworker, so the worker logic has to be written in go correct?
for my rails app
if its something simple like sending email, I can use that, but if it involves something where I need activerecord is it still a good option?
> For simple dequeuing, deserialization, execution, and updating stats (as with the Hello worker above), goworker is 80x faster than Resque and 23x faster than Sidekiq.
In real world use cases, how often is that kind of overhead a significant portion of total run time, compared with the actual work? If it's a small portion, would that make the speed up fairly irrelevant, does it actually speed up real world use cases significantly?
Of course, your actual work payload may be faster too in go than ruby if you know how to write it well.
You're right that this is less significant if you are hitting an empty queue (triggering a timeout) between each request.
However, even when not comparing (seconds to computer 10000 jobs) benchmarks, you still get benefits from goworker. Since dequeuing, deserialization, and updating stats is faster, the latency between "job insert" and "job started" is lower, which means faster interface updates.
You can also restructure your workers. Instead of running one worker that loops 5,000 times because the Redis overhead is too high, you can have one job queue 5,000 more which can be run on several distributed workers.
Your conclusions about latency, and being able to restructure things so tasks are much smaller -- make sense. (Potentially; I'd still want to benchmark my own real world app before assuming!)
(I am confused with your first paragraph and a half, I think you have some typos in there, or I just don't understand what you're saying. But I understand what you're saying about latency and about being able to restructure tasks to be smaller, thanks.)
I mean that if your users are waiting on jobs, they get the result when `encode time` + `enqueue time` + `wait time` + `dequeue time` + `decode time` + `execute time` is completed.
Even though you might not need raw 1000 jobs/second throughput, your users will still benefit by reducing `wait time` + `dequeue time` + `decode time`, which goworker does.
I would actually expect the simple dequeue-deserialize-update-something-in-redis use case to be the "worst case" comparison and that the more work the go worker does, the better the performance numbers would look.
It seems like using something designed from the start for cross-technology communication (like 0mq) may be a better idea, but I suppose there is already lots and lots of existing resque client code out there.
Since jobs are just encoded as JSON into Redis, it's pretty already cross-technology compatible. Furthermore, we're committing to a format specifically for Resque 2.0, so you can guarantee that you'll be interoperable.
You can replace one single Resque worker to see how the performance improves without changing the front-end queuing or having to run an additional queuing server
> In real world use cases, how often is that kind of overhead a significant portion of total run time, compared with the actual work?
I wish more people paid attention to questions like this. Be it web servers, app servers, whatever; too many people look at microbenchmarks and decide that a move is imperative when they're really only optimizing a tiny fraction of the total run time.
> Of course, your actual work payload may be faster too in go than ruby if you know how to write it well.
I think that this is the big bet you're going to make when you choose goworker over a Ruby implementation.
What makes this intriguing is that in a web app scenario, background workers are often used for tasks that would take too long to run in the HTTP request/response cycle. This means there is a selection bias toward tasks that may be computationally difficult. That's not to say that all tasks handed off to background workers are CPU bound; many (most, maybe?) are I/O bound, but for the class of problems that are CPU bound, this offers a great solution.
But for the CPU-bound cases, people are usually calling into a library implemented in C either way (this is why Resque uses a forking model instead of green threads, after all.) The only speed-up you'll get is in the glue, because the task itself will probably use the same C library.
If you're already doing the majority of your work (total run time) in C code, then of course you're not going to benefit from a Golang back-end, but I'd challenge the "usually" portion of your claim. C wrapped in Ruby is not the same as pure C or Golang.
The tool-selection argument quickly devolves in to the usual arguments between the benefits of languages like Ruby versus lower level languages, but I'm not sure that's the most productive conversation. That's well traveled ground.
What I'm saying is that I'm happy to see this (Golang workers available through a Resque API) as an option. I'm not advocating blindly replacing C implementations (or any other implementation), but I wouldn't be surprised to see a Golang back-end outperform Ruby code, even when that Ruby code is calling in to C libraries. The usual rules apply when making that evaluation. You have to see for yourself.
Yup. I have distributed worker system with all the workers in PHP. The majority of the time spent is waiting on database requests and external HTTP calls. Even if the speed of code was instantaneous it wouldn't make much of a performance increase overall.
Even though most of the time is spent on I/O, goworker can get wins from its super cheap concurrency. I was able to run 100,000 workers on my single 4GB RAM laptop.
How would you write integration tests if you had a rails app that pushed jobs out to goworker?
Say a user clicks a link that creates an async job that does a API call somewhere. What's a good way to test that entire process at once? You can easily test them separately, that clicking a link creates an async job, and that the worker can process the job, but it's useful to test the whole system at once.
That's a great question that I haven't thought about before. You could possibly write a Go wrapper that executes your workers, getting the job from a command line argument rather than Redis. https://gist.github.com/benmanns/6584142
In my side-project, which is my first foray into Rails, I use a Resque background worker to do processing and it takes a significant amount of time. As I've heard more about Go, I've wondered how I could use it to speed up this background process. Here is my answer! Thank you very much, I'm looking forward to trying this out as it fits my needs exactly.
True, but I thought that that would look disingenuous, because it would look like the worker times scale logarithmically when they are about as linear as you can get.
Ah, yes. I'm using Flot and was having trouble getting good looking labels. I'll work on that right away, but in the meantime: X-axis is jobs, Y-axis is seconds to execute.
I would appreciate all kinds of feedback, and for anyone who wants to try goworker out, if you give me your email at the bottom of the page, I'll send you my contact info for some free one-on-one help getting started.