Evan Weaver from Twitter presented a talk on Twitter software upgrades, titled Improving running components as part of the Systems that never stop track at QCon London 2009 conference last Friday. The talk focused on several upgrades performed since last May, while Twitter was experiencing serious performance problems.
A very interesting observation during the talk was that Twitter started up with a CMS model and that they gradually moved towards a messaging model. I’ve seen this in a few applications so far, including a casino system, where the messaging model seems to fit best an application intended to power massive community of online users, it seems regardless of what the application actually does business wise. Applications start out completely different, but then more and more functionality gets bolted on top of user messaging capabilities that the whole architecture on the end gets refactored to utilise the messaging channels as the core information transport. With Twitter, I’d expect this to be more obvious from the start as it was intended to help people notify each other.
For every incoming tweet, the messaging system gets notifications for all the folowers, which are then processed asynchronously. One of the most important changes they introduced to improve performance in the last nine months is moving from a Ruby messaging middleware to a custom build JVM-based messaging middleware written in Scala.
The shift from the CMS to the messaging model allowed them to rethink caching policies and shared work. Before the upgrades, web sites were going directly to the database and API services had a simple page cache. Unsurprisingly, the upgrade ended by the whole Twitter running from memory with a backing database used just as a data store to recover cache on demand. The interesting thing, however, was that all the upgrades were done live, without shutting down the system. The changes were always introduced to one node, then regression issues were sorted out, and then the software would be rolled out to the whole cluster. They went as far as building a whole messaging system based on memcached APIs in order to be able to slot in such changes.
The upgrades included adding three more caching layers: a write-through vector cache of primary tweet keys with 99% hit ratio, a write-through row cache for tweets and users with 95% hit rate and a read-through fragment cache with rendered versions of different tweets for different clients with 95% hit rate. All these caches are based on memcached. Some other interesting optimisations introduced are that cache vectors have limited size (800 tweets back) and that rebuilding the home page for a user gets done by reading through the caches of users that he follows. As lots of people are rebuilding timelines based on the same users, this works very good according to Weaver. Another important upgrade was rewriting the queue middleware from scratch (moving from Ruby to Scala/JVM) and moving from a Ruby memcached client to a C client with optimised hashing. All together, this change allowed them to increase the web server performance from 3.32 requests per second without caching to 139.03 requests per second. Weaver said that API services work about four times faster than the web, which means that the API performance is roughly 550 requests/s [my calculation, not given during the talk].
The frond-end is completely written in Ruby/Rails, which remained the same before and after the upgrades. The middleware is now written in a mix of C and Scala, where the most important components are memcached, varnish (cache), kestrel (messaging, written in Scala) and a Scala comet server. They use MySQL for the data storage, which has not changed during the upgrades.
Some things that surprised me from this talk are:
- Web is only 10-20% of the traffic, the rest is through API services
- Web servers are still 50% of the cluster.
- Regular incoming traffic peaks are around 80 tweets per second. I expected this to be a lot more.
- before the upgrades their web servers shipped only 3.32 requests per second!
- For each tweet, message gets inserted for each user which follows a tweet. In average, a user has 120 followers so this comes to about 9600 messages/s at peak times
- Message servers run on three nodes. They decided to write their own messaging software in order to make the protocol memcached-like, and did not evaluate other available solutions.
- During Obama’s inauguration, they peaked at about 350 tweets per second for around five minutes.
- They had a ton of problems with garbage collection but strangelly haven’t looked into JRockit RT or something similar that has predictable GC. Twitter JVM middleware runs on the SUN JVM
Evan published the slides for the talk on his blog, including the cache diagrams.
I covered QCon in detail on this blog. See other reviews and news from the conference.