Apr 7, 2008
Published in: Software design articles
It is fairly obvious that web site performance can be increased by making the code run faster and optimising the response time. But that only scales up to a point. To really take our web sites to the next level, we need to look at the performance problem from a different angle.
Although an average web server is able to process a few thousand requests per second, the number of requests it can actually handle at the same time is severely limited. Here are some simple figures:
So, the number of concurrently running requests per server is most likely to be in the order of tens or hundreds. To squeeze the most of a single machine, we need to avoid that bottleneck and make the requests as fast as possible. One way, that should not be overlooked, is to optimise the code and make each request do its job faster and release critical resources such as database connections as soon as possible. But that only scales up to a point.
The key to solve this problem lies in the classical definition of speed. In Physics, speed is defined as the distance divided by time required to cross that distance. So we can make the requests run faster either by decreasing the time, but we can also do it by shortening the distance. Instead of doing the whole thing quicker, we need to look into reducing the amount of work that a single request has to do. Technically, this comes down to drawing the line between what is processed synchronously (inside the request) and asynchronously (in the background) to complete the request workflow.
Clever choice of asynchronous processing is definitely one of the most important decisions in any enterprise system. This is especially true for Web sites, where it absolutely plays a key role. Here are a few ideas to think about when deciding how to split the work.
Web servers are typically configured to kill a request if it takes too long, and production servers often have much less tolerance for sluggish processing than development machines. If you get the urge to re-configure the web server in order to complete some processing, resist it by all means. The time limit is imposed for a good reason – web requests are really not suitable for longer operations. They take up scarce system resources and long request really became a point of contention for the system. A few long-running requests can slow everything down, not because they themselves are draining CPU or memory power, but because they hold on to important resources and make other requests wait for them. Also, web requests are not guaranteed to complete correctly. A server can kill the request because of a timeout. A user can also just close the window and interrupt the workflow, without sending any notification to the server.
It is miles better to just enqueue longer requests and take care of it from a background processing queue. The web response can complete the active database transaction, release system resources and return to the caller. Background services are much more robust and reliable than web requests, and they should execute long-running operations. The browser can poll the web server every few seconds and just check the status of the operation. If the user closes the browser, the operation still gets processed correctly. If the remote server is temporarily down, the background service can reprocess the request after a few minutes. You can cluster background servers and load balance longer operations. This kind of asynchronous processing is much more robust and will allow you to scale the system much easier. Clearly isolating longer operations will make sure that a few long actions do not block thousands of quick ones.
A common issue with external systems and synchronous communication is underestimating the latency. Talking about designing scalable systems, Dan Pritchett from Ebay pointed out that “One of the underlying principles is assuming high latency, not low latency. An architecture that is tolerant of high latency will operate perfectly well with low latency, but the opposite is never true.” Anything that goes out of the internal network should be handled asynchronously. This rule of thumb should really be common sense by now, but I still see it violated very often. If you process credit cards using an external payment provider, for the love of God do not try to authorise the transaction from the web response. Do not do this even if the processor is really fast. The server might work OK at the moment, but in a 24/7 environment, over the course of a few months, you have to expect that there will be remote connectivity problems. Network connections can fail, servers can start timing out and the poor users that are caught in the requests will be left hanging.
Most web servers process requests from the same session in a sequence, so the user will not be able to make a new request to the server while one of his requests is blocked, even from a different tab. To do anything useful, the user will have to close the browser and log in again. And, as the processing is caught in a blocked request, you will have no idea what actually happened. Whenever I hear about transactions getting “stuck”, it is most likely because of synchronous communication to an external server. The payment provider might charge the user’s account but your server might not actually record that the money arrived on your end.
Domain Driven Design suggests using aggregates as boundaries for synchronous processing. It would be very hard to convince anyone that your web server and the payment processor are parts of the same aggregate, regardless of how you structure the application.
The CAP theorem coined by Dr. Eric A. Brewer in ‘98 says that any system can have at most two from the following group of properties: Consistency, Availability, tolerance to network Partitioning. As the number of users grows, availability and partitioning become much more important than consistency. By (temporarily) giving up on consistency, we can make the system much faster and much better scalable. Gregor Hohpe wrote a really nice article on this subject in 2004, called “Starbucks Does Not Use Two-Phase Commit”. I strongly suggest reading it if you have not already done so.
Web applications often try to do too much at the same time. When an application needs to serve 20 or 30 thousand users over a month, doing too much might not be seen as a problem at all. But once the user base grows to a few hundred thousand, behaving lazy will significantly improve scalability. Think really hard about things that do not really have to be processed instantly. Break down the process and see if something can be postponed for later, even if it may cause slight problems. Anything that is not likely to generate a lot of problems, and the problems it causes can be easily fixed later, is a good candidate for taking out of the critical path.
My rule of thumb to check what can be left out of the primary request workflow is to ask what is the worst thing that can happen if things go bad, and how frequently can we expect that. I’ll use an online bookstore again as an example: a shopping cart checkout request should theoretically check whether the book is in stock, authorise the payment, remove a copy of the book from the available stock, create an order item in the database and send it to the shipping department. New payment industry standards, such as Verified by VISA, make it hard to process the payment offline. However, checking and modifying the stock can safely be left for later. What is the worst thing that can happen if things go wrong? We run out of stock and over-sell a bit. We can notify the user about a slight delay, reorder the book and ship it a few days later. Alternatively, the user could cancel the order and get the money back (if the transaction was just authorised and not captured, the money will never be taken from them in the first place). How frequently will this happen? Not a lot, since the bookstore should manage stock levels pro-actively. In this case, the request can just authorise the payment and put the shopping cart into a background queue. We can process the requests in the queue overnight, when the system is not under a heavy load. By avoiding to use a shared resource (stock) we avoid both the contention and simplify the request workflow.
Get practical knowledge and speed up your software delivery by participating in hands-on, interactive workshops:
Get future articles, book and conference discounts by e-mail.