Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Real-world anecdotes of MySQL at scale?
76 points by herodoturtle on Sept 27, 2022 | hide | past | favorite | 95 comments
What is the biggest MySQL database you've every worked with?

Any interesting stories to tell?

I'm trying to get a sense of what MySQL is like at massive scale.

Thanks ^_^



I used to work at Second Life. Their player inventory and monetary transaction system was in MySQL. Every single item a player had or made was a record in this database. If you’ve never spent time in SL then the scope of this is probably a lot bigger than you imagine. As an active creator in SL I had 10s of thousands of records in there myself for inventory, and lots more activity for financial transactions and history. The database was distributed into somewhere between 30 and 50 servers, each of which had a read only replica. User inventory was divided amongst these shards. Any transactions that could execute on the read only replica did, and transactions that needed to be placed on the main server for a user executed there.

There were always concerns about the latency involved in the replication. Whenever I did large scale batch operations that needed to write lots of records in master servers, I would monitor the replication queue length and suspend operations when it got too far behind to keep things sane and healthy.

The biggest failure in this architecture was sharing sql queries as opposed to sharing service calls. This is something that was being fixed as I left. Switching it to work with service calls gives more flexibility in the underlying implementation and opens more options for improving the system.

For most kinds of large systems I’d recommend thinking about the architecture in terms of distributed Actors hosted in something like DAPR, Orleans, Akka, or Erlang/OTP and then the technical details of the underlying database become less of a constraint. It grants you flexibility for your underlying storage that you would not otherwise have.

Fun side fact: Bos, the CTO of meta, was a key person working on making all the magic happen there back then.


> The biggest failure in this architecture was sharing sql queries as opposed to sharing service calls

noob question but what do “sharing queries” and “sharing service calls” mean?


Suppose there are several different major components of a large system that need to know something about an important entity like a user for example. A good way to share that information is to have a user service that answers those questions (in a larger system) or a user library/data provider that does this (in a smaller system). A worse way to do that is to have sql queries that different systems use to get that data.


Isn't shared queries faster though?


It’s one less layer, so yes. But going thru a service gives you the opportunity to use caching, cache invalidation, or switch the operation to different kind of data store. So potentially it can be faster sometimes. Also if the database is occasionally burning up from being overworked then survival and reliability are much more important than speed.


A service layer is also a single point of failure. So if uptime is a concern, one less component that a developer can accidentally take out and bring down the whole system is valuable.

One of the most pernicious dangers in shipping a successful project is you paradoxically both cargo cult and critique random parts of your success. Looking back, you might decide that choice A was correct, while choice B was incorrect. But you don't have the benefit of having tried all of those other options. So don't trust postmortems too much, they are single data points in a sea of failures. They could very well be random chance.


a database noob question, when you say distribute the database to 50 servers all these are shards, correct? MySQL (or any standard RDBMS) can't span more than one server, correct ?


> when you say distribute the database to 50 servers all these are shards, correct?

Yes, these are separate server instances that are essentially separate from each other. All of the records for a given user live on a single particular server. Although MySQL doesn't do this kind of partitioning inherently, this kind of partitioning can be implemented in the application.


Is there a write up somewhere on how you handled cross database transactions? E.g. Bob on db#12 gifts Alice on db#23 a "MegaPlushie(tm)"item?

Interested to know how you'd avoid duplicates and lost transactions while maintaining a comfortable level of performance/latency for an online game.


With mysql this can be handled via XA transactions https://dev.mysql.com/doc/refman/8.0/en/xa.html


would be cool if there was a free open source program that did this well written once so we could all benefit from it


It's not something that you can easily abstract away from the application's concerns without a significant performance penalty. For the most basic CRUD operations, an ORM could probably do a decent job. Surely someone has already written an ORM that supports sharding?

Beyond CRUD, I'm not so sure, and any application that's big enough to use sharding probably has some pretty complicated queries. At that point you really don't want to be depending on a library to handle the low level details. You need to think about the performance implications of the access patterns and the trade-offs inherent in different design choices. If a library makes these choices for you then you are unlikely to get optimal results.


There’s Vitess that’s been mentioned on HN a lot recently. https://vitess.io/


Sharding is when you store your tables, or parts of tables, on other machines, so that queries are executed by the resources of those remote machines.

Read-only replication stores a copy of the whole database on other machines, so that any query which doesn't write can be handled by asking the remote machine. Writes are bottlenecked through the primary machine, which then sends changes to the remotes.


> For most kinds of large systems I’d recommend thinking about the architecture in terms of distributed Actors

Did you do this at SL? Because this could be a case of "If I had to do it over again I'd make it perfect" (they say hindsight is 20/20, but it only seems to be).


Facebook is backed by MySQL https://engineering.fb.com/2021/07/22/data-infrastructure/my...

(though it's only used as the backing data store for their graph database https://www.usenix.org/system/files/conference/atc13/atc13-b...)


> only

This is nearly all the data stored by Facebook. Every profile post, comment, photo, story, marketplace listing is a row in one MySQL table. Even internal tasks and code changes are rows stored in this one table. I don’t think a bigger MySQL table exists.

Fun fact - You can also access any object if you know it’s ID with Facebook.com/id. For example, Mark Zuckerberg’s profile was the 4th object created, so that’s where https://facebook.com/4 leads.


This is right. Source: worked on the FB MySQL team, though it was a long time ago. There is also massive, massive associated caching infra (memcached, Tao). There isn't as much hoopla about it but there are some public blog posts. I think it's even more interesting than the backing store.

The UDB tier used to use the InnoDB engine; nowadays it's MyRocks: https://research.facebook.com/publications/myrocks-lsm-tree-...


There are a couple of blog posts and associated papers on TAO:

https://engineering.fb.com/2013/06/25/core-data/tao-the-powe... https://research.facebook.com/publications/flighttracker-con... https://engineering.fb.com/2021/08/18/core-data/ramp-tao/ https://engineering.fb.com/2022/09/07/open-source/taobench/

I work on TAO consistency at FB and we've put out a bunch more articles/papers in the past couple years. Until 2 years ago, there was just the 2013 paper on TAO itself.


My memory is fading years after leaving FB, but I don't believe "one table" is correct at all. As far as I can recall, UDB's design permits (but does not require) each object type to have its own table; ditto for association types. I believe different object tables have the same structure / flexible serialization strategy, but the important ones are each distinct tables within a shard.


To clarify, since you worded this a bit ambiguously: MySQL is the backing store for the graph database, which is the primary Facebook product database tier. But separate from that, MySQL is also used for a ton of other purposes at Facebook.

Or at least it was as of a few years ago; AFAIK this is still the case. For example the internal MySQL DBaaS tier had hundreds of distinct use-cases throughout the company.


Assuming "at scale" means you have no other option than horizontal scaling.

At this point differences between databases start to shrink. Most of the complexity and features are being moved to the application layer, and your database is becoming dumber.

More subtle (and frankly boring) features are becoming important. Like how cost-effective is it to run, observe, backup/recover, etc.

Honestly, I feel like most interesting stories are before "at scale". Stories about how you delayed the need to "scale" for several years (and probably saved the company by allowing to focus on product instead of infra). But unfortunately those stories aren't something people tend to brag about. Nobody writes stories about how they chose to be pragmatic and not scale because they could just buy a bigger server or add a caching layer.


> Nobody writes stories about how they chose to be pragmatic and not scale because they could just buy a bigger server or add a caching layer.

That would be a pretty boring talk, to be fair.


Nope most of the big transactional databases run on Oracle . You can horizontally scale using RAC ofcourse you need shared storage


I operate around $300k monthly mysql infra. We service most schools in the US with a netflix type service. We have burst traffic (at noon EST we x50 traffic). Here is what we learned:

1) RDS/Cloudsql is awesome but too expensive, we can't afford it. The things it provides are not that special with VMs anyway (images, backups, replication).

2) The industry moved to vitess db, but converting to this is too daunting, we are focused on automation and cost reduction instead.

3) Use index based sharding (no math shards % total, no consistent hashing, no drama). When you get a new user, setup where they should be on each shard and keep that on a single table [user_id,video_cluster_id,session_cluster_id,..]. This way moves are easy. Then video_cluster_id is a load balanced host infront of a fleet of replicas.

4) The hardest thing you will run into is write saturation. You can't write fast enough to a disk. Advanced topologies and 3) helps here. Example: [MasterA] <- [Slaves,0..n] <- [MasterB] <- slaveb,0..k]. This way you can push more writes to MasterB, and add replication filters so that slaveb doesn't need to perform all the writes MasterA needs.

5) SSDs. Don't bother with anything else.

6) We would pay a million dollars for this tool: https://docs.percona.com/percona-toolkit/pt-online-schema-ch...

7) GTID took away the hard parts of mysql replication.

8) Hardcore mysql experience is disappearing. With RDS/Cloudsql/Postgres less and less developers are going able to patch a production problem.


Dummy question but what happens when your user data is in one share but some data you need to join is in another ? For example .. all the bills of this user, or all the events he has done


Let's say you have built an email database. Marketing, ML team pumps in tons of writes and you use it while sending out emails. You have a account db that holds on the user information. What is AWESOME, you can create a topology like: [accountDB,replicas] <- [emailDB,replicas]. Where emailDB only replicates the user table from accountDB using replication filters [1]. This way anyone writing a service can just do normal joins [select * from email,user where user.id=email.user_id] etc ON the emailDB.

https://dev.mysql.com/doc/refman/5.7/en/change-replication-f...


PlanetScale is the way to leverage Vitess without any of the hassle.


what company is this?


I was in a system design interview where the interviewer didn't like that I chose MySQL for storage, because it doesn't scale. I pointed that I literally used to work at a b2c app in this exact problem space that exclusively used MySQL storage. I haven't heard back yet from the company and its been 2 weeks :-/.


Well MySQL scales in much the same way that SQLite does – if you build sufficient tooling around it, you can scale it as much as you like.

MySQL comes with a bunch more built-in that SQLite does, but Facebook and GitHub aren't scaling purely with the built-in functionality, they're writing their applications to be sharding-aware, they're writing custom proxies, they're running custom extensions, they're picking and choosing which bits of core functionality they allow in order to minimise scaling issues.

I think in a systems design interview these issues need to be addressed, to show that you have the skills to be able to implement this, and I think that a good interviewer should be open to this discussion!


GitHub is using Vitess internally to avoid applications needing to be shard aware


That's unfortunate. I hate system design interviews where the interviewer has a "known" solution in mind and won't accept any other solutions even with good arguments, but you could have dodged a bullet if they are completely stubborn to work with.


You should tell them about Facebook, YouTube and GitHub, on top of your own personal experience (which might be harder to verify).


I mentioned Facebook. The problem was about storing product reviews and I literally used to work at Yelp, which exclusively uses MySQL for the primary database. My current job also includes product reviews and I am very familiar with how low traffic review systems are. MySQL can definitely handle 10qps writes.


Please leave a Glassdoor review so the world knows that their tech leads are clueless.


"MySQL for storage, because it doesn't scale"

haha.


> because it doesn't scale

Did they give an example of a database that "scales"?



LOL. Okay...


Pinterest uses (used?) mysql. Blog post is a bit old but a fun read https://medium.com/pinterest-engineering/sharding-pinterest-...


We read this numerous times before finally creating our own sharding system over bare MySQL


FB's MySQL team gave a tech talk on sharing circa 2011. A couple of bright engineers from Pinterest were there and asked a lot of great questions. A year or two later the same gentlemen presented their own MySQL infra at the Percona (?) conference. They'd used that talk as a springboard for their own design, which was quite clever and clearly integrated some of their unique business needs.

A decade later, I still remember how good their presentation was! I wish I could remember the engineers' names.



God, I'm building a big MySQL schema right now, and I wish I had read this months ago! Too late for me now. Oh well, next project!


Yes! Marty Weiner and Yashh Nelapati. Appears to be a different talk though. Nice find.


One thing to keep in mind when looking at older MySQL scale-out presentations and war-stories: they're mostly MySQL 5.0-5.6, and a chunk of them are bare metal rather than cloud. The problems you'll encounter today -- scaling out in a modern cloud environment on MySQL 8 -- may not line up too well with what folks hit in the past. Ditto for best practices on architecture and ideal max db instance size.

I gave a bunch of MySQL conference talks from 2011 through 2017, including several focused on scaling and sharding at social networks. Looking back through them today, a lot of content from the scaling/sharding ones honestly isn't too useful if you're trying to scale up a new MySQL deployment right now.


What has changed? Is it the performance characteristics of a VM vs bare metal? Is it the characteristics around replication in MySQL 8? Something else?

I imagine sharding, read replicas and replication are still a thing, right? Or am I wrong?

Is it the particulars or the big strokes that need updating?


It's just a completely different set of scaling pain points. The war stories and anecdotes of the old days just aren't very applicable anymore for newer companies.

For example, on bare metal, physically running out of disk space on a hot shard was a relatively common concern. It required a lot of monitoring, automation, and capacity planning to avoid. Whereas on something like AWS Aurora, you mostly just don't worry about it unless/until you're getting into the many-TB range.

On bare metal, building out a new data center / region is a huge task. Automating the process was even more complicated. Whereas with large cloud providers, you already have a lot of regions available, and nice APIs to automate things.

Hardware of 10-15 years ago had slower IO, less storage, way less RAM, less network bandwidth... and with bare metal you had to deal with multiple generations of hardware in some fashion, often both within a region / data center and between different regions / DCs. This gets especially ugly when you have replicas in each region with very different hardware profiles.

Meanwhile in terms of MySQL versions, there's been major strides in replication options/topologies/perf, crash safety, online schema change, automation via MySQL shell, JSON functionality, built-in physical copy/clone, many other things.

The third-party tooling ecosystem has also evolved substantially -- monitoring, deployment, HA/failover, proxies, sharding, schema management, backup/restore, etc. Large companies used to all solve this stuff in-house, often in messy home-grown ways... today there's great generic options that are basically usable out-of-the-box.


Are you referring to distributed MySQL such as Vitess? It is the backend for Slack and GitHub; also was the backend for YouTube in the past.

https://vitess.io/


Google "mysql booking.com" for a beast of an example. There's lots of content out there about that one.


There always seems to be countless examples of companies using MySQL at extremely large scale.

What are some Postgres examples?

As a Postgres fan, I'm genuinely curious. Please don't take my comments as troll-y.



I used to work for a company called "Venda" that was doing 1% of web* traffic in 2011, they used PostgreSQL as a backing store. Mostly because MySQL was corrupting data silently and that's not good.

* web indicates that it was HTTP traffic, a subset of all internet traffic



reddit


It is very rare we get the subject of MySQL on HN Front-page considering HN have been pretty much Anti-MySQL, so I will take this opportunity to ask;

Does anyone know if MySQL 9 is on the roadmap? Or are we going to get mostly small iteration and bug fix release only?


> It is very rare we get the subject of MySQL on HN Front-page considering HN have been pretty much Anti-MySQL,

Which is quite hilarious in the context of looking at all current answers mentioning MySQL as the database for plenty of high profile companies.


If not most


Shopify Engineering has blog posts you may find interesting:

https://www.google.com/search?q=mysql+site%3Ashopify.enginee...


Etsy, via sharding & a custom ORM — the 'Flickr architecture'.

In 2021: "We saw 1.8M queries per second on our massively sharded MySQL databases." https://twitter.com/codeascraft/status/1466174452459196420

Here's a post from a decade ago: https://www.etsy.com/codeascraft/two-sides-for-salvation


Vitess is by far the biggest I've seen in practice.

Sharding is at the root of storage management, which let's users design a strategy that fits their storage and request benchmarks. The biggest difference that I've seen is that you no longer get to use many of the MySQL operations and functions that you may be used to like relationships. It's analogous to the best and slightly painful parts of MySQL and NoSQL. Things like relationships get relegated to the application.


There are any number of presentations on Facebook operating MySQL at scale (eg [1][2]). I don't know how many servers they have but half a million wouldn't surprise me.

[1]: https://www.youtube.com/watch?v=NfS5ZLNPxS4

[2]: https://www.youtube.com/watch?v=kP6undC_HDE


We have around 35tb MySQL deployment in master slave config. The most interesting nightmare happened to us when doing the master fail over using ghost and panic manual intervention created 2 active master scenario leading to chaos which we had to clean for days along with monetary loss : ) Lesson learnt don't do manual intervention when ghost is doing fail over


Really the only thing we run into trouble with is `count(id)` where `id` is the pk of the table. This is surprisingly slow when you get above a couple hundred million rows. We're 3rd normal form, with proper fk's and indexes.

I _highly_ recommend having someone or some team in charge of db architecture. Use the "remarks" column to make a data dictionary that's stored right in the table. Have an _exact_ specication for table names (plural nouns), column names (agent_id (always a fk), agent_name), index names (udx-agents-agent_name), fk names (fk-leads->agents).

But, here in lies the rub: ALWAYS implement today's requirements, not next-years. If your app can be powered by a single mysql instance, _just do it_. You'll be in a much better position to scale later that if you pick some excessive technology you _think_ you might need in the future.



Wikipedia uses MySQL and although I worked there for 7 years I never did much work directly with the DBs.

It's been 9 years since I worked at deviantart.com but last I knew the site still operated on a large sharded MySQL cluster divided into something like 32 individual shards where each user's data was on a shard determined by their userid.

Essentially just `shard = userID % shardCount`;

The application would connect to all of the servers and for many queries you have to send the query to all 32 servers, or some subset based on the records you seek, and then merge the results of all of those queries.


Hey 20after4 ;)


They see me shardin’


The largest databases I've worked with were sharded, each with replicas and failover of writers. With 100s of shards each database is a smaller database and maybe doesn't even count as the largest I've worked with.

Large but smaller ones were federated (divided by tables rather than rows) also with group replication and replicas. The most interesting case was switching over a MySQL 5.6 InnoDB to a 5.7 TokuDB storage engine. Bringing up a new replica used to take hours to catch up depending on how old a snapshot you were starting with. Using TokuDB it took 10s of minutes, well under an hour. TokuDB isn't quite as effective as LSMTrees but MyRocks/RocksDB wasn't (isn't?) ready for 5.7 and last I checked had some stricter requirements.

The best think about MySQL is how (relatively) easy it is to set up replication with multiple-writers, and even load-balance/failover behind a TCP proxy like keepalived. Backups are pretty easy to script, stop replication on an instance, stop it, copy files, resume (and let it catch up replication, then re-add to query pool).

When you scale vertically to 512GB memory servers using NUMA memory, things get tricky. Had to tweak the BIOSes to get the same spec'd machines to behave mostly the same w.r.t. thread utilization and settle on a memory buffer partitioning scheme that minimized lock waits. Also split off slow queries to 'reporting' read-replica instance(s).

Nowadays folks would probably prefer managed services like AWS Aurora.


This answer to your question is more about scaling out (many machines) than scaling up (big machines).

Many budget hosting providers supporting CMSs like WordPress and Joomla run very large multitenant MySQL or MariaDB instances. These instances often get into performance trouble when one or more of the customer sites on them grow large or get traffic spikes. The CMSs offer elaborate caching strategies (based on redis, memcache/d) to help mitigate this. But of course the multitenant nature of these instances means that individual customers must configure the caching. And the lowest-end hosting providers don't offer redis or memcache/d.

It's possible to monitor various operational parameters (via `SHOW GLOBAL STATUS` and similar SQL statements) to detect problems. It's very hard for customers to mitigate configuration problems (like insufficient buffer pool sizes or too few maximum connections). But customers can change table indexes.

Other hosting providers operate large numbers of small separate MariaDB and MySQL instances, one per customer. Considering that WordPress powers something like 30 - 40% of the sites on the internet, the total number of instances is vast.

WordPress.org, a vast site, uses multiple read replicas.

Version upgrade inertia is a big problem for hosting providers. One provider (whom I will not name) runs a MySQL version that reached end-of-life well over a year ago. New-start MySQL / MariaDB project designs should make provisions for DBMS version updates. Updates are worth applying because the development teams do lots of performance-enhancement work.


I have one running with a ~5TB big unpartitioned table. Adding ~9M rows per day, with significant read load via 2 replicas. Works just fine on a 32 GB memory RDS instance. There is another table half the amount of rows and I join the two 1000s of times a day.

Took a bit of fiddling around to have things run stable. Currently migrating to Redshift, which has stellar performance but comes with its own problems.


A more down-to-earth example: Evernote is (almost entirely) on MySQL. Since most of the stuff is "yours", it's fairly easy to shard partitioning by user. And the global stuff which is not that much (users/auth/etc) is stored in beefy replicated instances.

Newer features are being built on other data stores, and some things are being migrated away, but the core stuff is still in MySQL.


https://github.blog/2018-06-20-mysql-high-availability-at-gi...

depends what you mean by massive scale, but mysql can certainly handle a website of say, stackoverflows size, on beefy hardware with read replicas. facebook would be another story entirely


Facebook uses MySQL. All the data is one MySQL table.


> All the data is one MySQL table

This is not correct at all.

Facebook has many distinct MySQL tiers (distinct sets of shards) supporting different purposes. And each tier has many tables.

The UDB tier (user database / primary product backing store) had several hundred tables when I worked on Facebook's MySQL team 7+ years ago.

This was before the MyRocks migration though, and it's possible the UDB layout changed to reflect MyRocks' extensively different file layout vs InnoDB. But in any case, especially keeping in mind all the various distinct tiers, saying all data at Facebook is one table is egregiously incorrect.


But not one physical table. It is certainly sharded across 1000s of machines. Basically a tuple store at that point, rather than a rdbms


You need to qualify massive scale, it can be on multiple dimensions: * "Large" data (these days, 10s of TB of data, maybe with multi-TB individual tables, billions of rows in the large tables) * "High" QPS - reads * "High" QPS - writes

High QPS is in the eye of the beholder, but let's say something like a few hundred thousand QPS and up.

If you have scale in just one of these 3 dimensions, the solutions are relatively easy (except maybe for high write QPS, depends on the nature of the writes). If you are at scale on all three dimensions, solutions get harder (sharding, non-innodb storage engines).

Of course, there is also the wildcard: the nature of your queries/transactions. Large transactions along with scale will make your life a living hell.



When I left Fitbit we were still using MySQL as the primary datastore for all tracker data. Admittedly only in the 10s of millions of users, but tracker data was stored at sub second precision and indefinitely. It was definitely up in the 10s or 100s of TB. Unlike most DBs it was more write heavy than read heavy.

Our scale has handled by sharding based on userId and geo. We could dynamically shift the shards when scaling the cluster up and down. I wish we had open sourced that sharding layer, it was pretty nifty.


vitess / planetscale might be a viable alternative set of open-source tooling for things like dynamic resharding.


oh yeah Vitess looks slick


I remember coming back to "Optimised Pagination using MySQL"[1] to avoid common pagination "slowness".

[1]: https://www.xarg.org/2011/10/optimized-pagination-using-mysq...

[more]: https://stackoverflow.com/a/32360867


Uber uses an in-house db called Schemaless which is built on top of MySQL for many of their largest datasets (many petabytes each).

https://www.uber.com/blog/schemaless-part-one-mysql-datastor...


EA uses MySQL for a lot of things, largest game like FIFA / FUT, they have thousands of servers.


You can read about one of major Slack outages here https://slack.engineering/slacks-incident-on-2-22-22/. MySQL included.


Google AdWords was run on MySQL for a long time: https://news.ycombinator.com/item?id=7645703


Look at Youtube, or PlanetScale.


Or Square, or Slack.

Following this thread closely - we are following in the footsteps of some of these co’s, and out is the way forward.


Let us know if you need any assistance. The above area all customers of ours.


Shopify, during the interview , pretty crazy stuff


A ton of pain without a DBA team dedicated to it.


What is good "at scale" without a team dedicated to it?


Is this comment based on anything in particular?





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: