The performance tests here seem about useless. When memcache is slower than all your databases its a good indicator that after your first call all the databases just cached the request, to get meaningful performance metrics you would need to use sizable databases and more realistic use cases. It's not enough to just publisher a disclaimer, why not just do it right in the first place?
It also states transactions as a benefit of MYSQL then shows the performance tests against the MyISAM engine which doesn't support transactions. This also explains why the MyuSQL inserts might be faster than any others.
These "NoSQL" comparisons are a tricky thing, inasmuch as this is a very volatile space with a lot of really interesting development happening. Furthermore, benchmarking a handful of storage options that are really only alike in that they share the same thing which they are not (relational, SQL) is a lot like running road tests on a handful of cars that are not Honda Civics.
I made the foolish decision a while back to choose a data storage option based on benchmarks and hype, and I'm currently paying for it by having achieved spectacular launch failure and now completely rearchitecting my backend.
NoSQL emerged only after a decade or so of webscale challenges helped us finally realize that everything only looked like relational nails because the only tool we had was SQL hammer, and that tackling every data warehousing problem with SQL was coming at the cost of money, engineering hours, and headaches. Now that viable, production-ready alternatives have become available it's important that design decisions to go beyond seeing problems as not-nails and seeing them instead for what they are.
Are you looking for reliable, versioned master-master replication for embedded devices? Might want to consider Couch. An obscenely fast caching tier that need not persist to disk? Sure, check out Scalaris. A fully distributed and pluggable map-reduce architecture capable of storing a metric fucktonne of data? Hbase might be for you. And, it should go without saying, there are still and there will always be problems for which SQL is the best solution.
I made the foolish decision a while back to choose a data storage option based on benchmarks and hype, and I'm currently paying for it by having achieved spectacular launch failure and now completely rearchitecting my backend.
Can I ask what you used - and how it hurt you? Just so I can avoid making the same mistakes ...
One thing I see people rarely mention in these discussion is that since CouchDB is master<->master, you can design applications completely differently: peer-to-peer.
Thanks. The important thing about CouchDB is that it allows you to write applications you could never dream of, with the standard LAMP stack. p2p replication makes a huge difference.
The title is slightly misleading since they also include PostgreSQL, MySQL and Memcached in their tests. They conclude:
* MongoDB and Tokyo Tyrant are useful now. CouchDB has promise, but is too slow currently.
* Non-relational databases have shown their worth at larger sites when used cleverly.
* Non-relational databases will continue to improve performance, stability & features.
* Relational databases are still a great choice: fast, powerful and proven. With caching, denormalization, rework (e.g. Drizzle) & better replication, they will continue to be competitive.
I'm calling total bullshit on those performance graphs. Neither Tokyo Tyrant nor MySQL is anywhere nearly as fast as memcached unless you have a really bad memcached client (and they do exist).
I got somewhere over 90k stores per second on my macbook localhost with my java client. If he's only getting around 3k, he must be doing them by hand.
Nice writeup. My current favorite is MongoDB (I blogged a few days ago on an easy way to index and search text in MongoDB docs), but I still also use CouchDB (and covered CouchDB in my last APress book).
I have played a little with Tokyo Cabinet/Tyrant but find it a little too 'low level' - would probably be great if you only need a fast hash. Cassandra is also worth a good look: if you are a Ruby developer, the Cassandra gem can auto-install all of Cassandra and manage it for you - very elegant, really.
I admit that one reason I like MongoDB is that it is so simple to use, and the documentation is good.
Having just read the paper (really, a few screens of a presentation) I feel confident that it's safe to ignore these "findings". Something is amiss. No way memcache is slower than MySQL for inserts/writes, and on-par for queries. Just doesn't make any sense.]
Hard to tell. You can "speedup" MS Access to Redis speeds, with sufficient RAM and a good caching strategy.
If CouchDB is slow in one benchmark, the developers can just go out and do something "nice", like, memory mapping all the documents in the "database", and serving them with a tight epoll(2) dispatch system using sendfile(2).
Speeding up databases for a benchmark is easy; scaling them for the real world is the tricky part.
The thing we see with CouchDB in production is that, although under low load we aren't the fastest kid on the block, as you ramp up concurrency (hundreds or thousands of simultaneous clients) with mixed reads and writes, on a multi-GB database, we don't slow down.
Building a test harness is non-trivial, Igal's tests are a lot like ones I used in the CouchDB book to illustrate the importance of batch updates, so I don't blame him for the imprecision.
On this naive microbenchmark yes. Notice how fast CouchDB is when dealing with multiple documents (fetch all). Also I'm sure that on a benchmark showing many clients fetching many documents (which is a likely scenario in many applications), CouchDB would shine compared to other databases.
It also states transactions as a benefit of MYSQL then shows the performance tests against the MyISAM engine which doesn't support transactions. This also explains why the MyuSQL inserts might be faster than any others.