Lazy web sites run faster

It is fairly obvious that web site performance can be increased by making the code run faster and optimising the response time. But that only scales up to a point. To really take our web sites to the next level, we need to look at the performance problem from a different angle.

How much can you handle?

Although an average web server is able to process a few thousand requests per second, the number of requests it can actually handle at the same time is severely limited. Here are some simple figures:

  • With sufficient network bandwidth from the datacentre to the world, an average web server could execute several thousand requests per second, depending on the average size of the response.
  • However, servers like IIS and Apache allocate a thread to execute each individual request. Although a single thread is lightweight by nature, four or five hundred concurrent threads are quite a hit on the operating system resources. So those web servers can actually execute only at most a few hundred requests at the same time.
  • The actual limit might be lower than you think: by default, a single IIS 6 process creates only up to four threads per processor. You can increase the limit in the registry, but Microsoft suggests to leave it below 20. With a really cool eight core system, this still gives you only 160 concurrent requests.
  • If the requests share other pooled resources, such as database connections, the number becomes even lower. Database connections are split across the whole server farm, so it is most likely that a single web server will have something in order of magnitude of tens available connections in a pool.

So, the number of concurrently running requests per server is most likely to be in the order of tens or hundreds. To squeeze the most of a single machine, we need to avoid that bottleneck and make the requests as fast as possible. One way, that should not be overlooked, is to optimise the code and make each request do its job faster and release critical resources such as database connections as soon as possible. But that only scales up to a point.

The key to solve this problem lies in the classical definition of speed. In Physics, speed is defined as the distance divided by time required to cross that distance. So we can make the requests run faster either by decreasing the time, but we can also do it by shortening the distance. Instead of doing the whole thing quicker, we need to look into reducing the amount of work that a single request has to do. Technically, this comes down to drawing the line between what is processed synchronously (inside the request) and asynchronously (in the background) to complete the request workflow.

Clever choice of asynchronous processing is definitely one of the most important decisions in any enterprise system. This is especially true for Web sites, where it absolutely plays a key role. Here are a few ideas to think about when deciding how to split the work.

  • Delegate all long operations to a background process
  • Never ever talk to an external system synchronously, no matter how fast it is
  • Be lazy – if something does not have to be processed now, leave it for later

Idea #1: Delegate all long operations to a background process

Web servers are typically configured to kill a request if it takes too long, and production servers often have much less tolerance for sluggish processing than development machines. If you get the urge to re-configure the web server in order to complete some processing, resist it by all means. The time limit is imposed for a good reason — web requests are really not suitable for longer operations. They take up scarce system resources and long request really became a point of contention for the system. A few long-running requests can slow everything down, not because they themselves are draining CPU or memory power, but because they hold on to important resources and make other requests wait for them. Also, web requests are not guaranteed to complete correctly. A server can kill the request because of a timeout. A user can also just close the window and interrupt the workflow, without sending any notification to the server.

It is miles better to just enqueue longer requests and take care of it from a background processing queue. The web response can complete the active database transaction, release system resources and return to the caller. Background services are much more robust and reliable than web requests, and they should execute long-running operations. The browser can poll the web server every few seconds and just check the status of the operation. If the user closes the browser, the operation still gets processed correctly. If the remote server is temporarily down, the background service can reprocess the request after a few minutes. You can cluster background servers and load balance longer operations. This kind of asynchronous processing is much more robust and will allow you to scale the system much easier. Clearly isolating longer operations will make sure that a few long actions do not block thousands of quick ones.

Idea #2: Never ever talk to an external system synchronously, no matter how fast it is

A common issue with external systems and synchronous communication is underestimating the latency. Talking about designing scalable systems, Dan Pritchett from Ebay pointed out that “One of the underlying principles is assuming high latency, not low latency. An architecture that is tolerant of high latency will operate perfectly well with low latency, but the opposite is never true.” Anything that goes out of the internal network should be handled asynchronously. This rule of thumb should really be common sense by now, but I still see it violated very often. If you process credit cards using an external payment provider, for the love of God do not try to authorise the transaction from the web response. Do not do this even if the processor is really fast. The server might work OK at the moment, but in a 24/7 environment, over the course of a few months, you have to expect that there will be remote connectivity problems. Network connections can fail, servers can start timing out and the poor users that are caught in the requests will be left hanging.

Most web servers process requests from the same session in a sequence, so the user will not be able to make a new request to the server while one of his requests is blocked, even from a different tab. To do anything useful, the user will have to close the browser and log in again. And, as the processing is caught in a blocked request, you will have no idea what actually happened. Whenever I hear about transactions getting “stuck”, it is most likely because of synchronous communication to an external server. The payment provider might charge the user’s account but your server might not actually record that the money arrived on your end.

Domain Driven Design suggests using aggregates as boundaries for synchronous processing. It would be very hard to convince anyone that your web server and the payment processor are parts of the same aggregate, regardless of how you structure the application.

Idea #3: Be lazy – if something does not have to be processed now, leave it for later

The CAP theorem coined by Dr. Eric A. Brewer in ’98 says that any system can have at most two from the following group of properties: Consistency, Availability, tolerance to network Partitioning. As the number of users grows, availability and partitioning become much more important than consistency. By (temporarily) giving up on consistency, we can make the system much faster and much better scalable. Gregor Hohpe wrote a really nice article on this subject in 2004, called “Starbucks Does Not Use Two-Phase Commit”. I strongly suggest reading it if you have not already done so.

Web applications often try to do too much at the same time. When an application needs to serve 20 or 30 thousand users over a month, doing too much might not be seen as a problem at all. But once the user base grows to a few hundred thousand, behaving lazy will significantly improve scalability. Think really hard about things that do not really have to be processed instantly. Break down the process and see if something can be postponed for later, even if it may cause slight problems. Anything that is not likely to generate a lot of problems, and the problems it causes can be easily fixed later, is a good candidate for taking out of the critical path.

My rule of thumb to check what can be left out of the primary request workflow is to ask what is the worst thing that can happen if things go bad, and how frequently can we expect that. I’ll use an online bookstore again as an example: a shopping cart checkout request should theoretically check whether the book is in stock, authorise the payment, remove a copy of the book from the available stock, create an order item in the database and send it to the shipping department. New payment industry standards, such as Verified by VISA, make it hard to process the payment offline. However, checking and modifying the stock can safely be left for later. What is the worst thing that can happen if things go wrong? We run out of stock and over-sell a bit. We can notify the user about a slight delay, reorder the book and ship it a few days later. Alternatively, the user could cancel the order and get the money back (if the transaction was just authorised and not captured, the money will never be taken from them in the first place). How frequently will this happen? Not a lot, since the bookstore should manage stock levels pro-actively. In this case, the request can just authorise the payment and put the shopping cart into a background queue. We can process the requests in the queue overnight, when the system is not under a heavy load. By avoiding to use a shared resource (stock) we avoid both the contention and simplify the request workflow.

Image Credits: Spencer Ritenour/SXC

I'm Gojko Adzic, author of Impact Mapping and Specification by Example. I'm currently working on 50 Quick Ideas to Improve Your User Stories. To learn about discounts on my books, conferences and workshops, sign up for Impact or follow me on Twitter. Join me at these conferences and workshops:

Specification by Example Workshops

Product Owner Survival Camp

Conference talks and workshops

17 thoughts on “Lazy web sites run faster

  1. Liked the article, but you might want to find a better metaphor to start out with!

    “The key to solve this problem lies in the classical definition of speed. In Physics, speed is defined as the distance divided by time required to cross that distance. So we can make the requests run faster either by decreasing the time, but we can also do it by shortening the distance.”

  2. Very insightful article – a lot of people put a load generator in front of their website, convince themselves it can handle 10,000 TPS, but when they put the system live, it’s bogged down at 200 TPS, 5% CPU and their customers are complaining how slow it is.

    The problem’s not just bandwidth, but network latency as well. Of course, when bandwidth is saturated, latency goes up, but even when there’s plenty of bandwidth, latency can be in the order of several 100 ms. Considering a 3-way TCP handshake, request, response and teardown, a simple HTTP request can take a second from start to finish, so if your concurrency is limited to 250 threads/processes/connections, you’re limited to 250 transactions per second. It’s a big problem for Apache and Java-based web servers (not so much for IIS where a single thread can manage many connections by way of IO Completion Ports, and not at all for event-driven servers like Zeus Web Server and nginx which multiplex connections fully).

    The simplest way to fix the problem is to get your transactions to come in and complete faster. You can do this using a full L7 proxy in front of the servers (such as Zeus’ ZXTM software – disclaimer – I’m a lead engineer with Zeus), so that from the server’s perspective, all requests come from a fast, local source over a local network, rather than from slow, remote clients.

    ZXTM does a bunch of additional optimizations as well; most effective is HTTP multiplexing, where it channels requests from thousands of different clients down a much smaller number of established HTTP keepalive channels.

  3. It may be a weird nit to pick — but what’s up with your wordspacing? If your word spacing is greater than your linespacing for paragraph text, legibility goes down.

  4. This is very interesting article, thanks for your great work.

    Question regarding laziness and asynchronous, they may conflict in some scenario. Using the same example of the checkout procedure. The credit card authentication is most likely an external web service, the order reporting is the business logic in the server-side.

    For laziness, we should execute the operations synchronously, easier to implement an atomic transaction as well.

    Or we could send the credit cart authentication and order reporting simultaneously, then merge them in the server side. It is possible the order reporting is wasted due to the failure of credit card authentication, but the chance is relative small.

    Suggestions?

  5. Kun Xi,

    in the example mentioned in this post, the credit card authentication should be processed asynchronously definitely – but as part of the primary user workflow (not postponed for nightly batch processing). So the web response should just create a card transaction in the database, get the id, enqueue it for processing and return the ID to the client. A background service should pick it up from the queue and execute. The browser should then poll the server every 2-3 seconds to check whether the transaction is complete. When the transaction completes, the browser can forward the user to the order report page.

    As for laziness, I was not referring to programmer laziness, but to lazy web pages. It is harder to implement asynchronous architectures but they will scale much better.

  6. In your example in 7. Assume the user enters his credit card # and clicks Place Order and the system does as you say in 7 – gets the id, enqueues it for processing and returns the ID to the user. The system polls and forwards the user to a order report page when complete.

    Can you elaborate a bit more on what the user sees:
    1) After he does Place Order I assume he immediately sees the order ID. correct?
    2) can he then navigate to another page to do some other processing or does he have to stay on the Place Order page until the transaction completes?
    2.1) if he can navigate away then at what point does the system fwd the user to the order report page? What is the user experience?
    2.2) if he cannot navigate then isn’t the elapsed time that the user must spend on the Place Order page the same, or even longer, than if he can navigate away.
    3) Do you have an example of a commercial website that behaves in the manner you describe above? Somewhere that I can make an inexpensive purchase in order to see the functionality.

    I’m trying to get a picture of the user experience as I have a similar situation with a system I’m involved with and the UI is always a bit less straightforward in the async case.

  7. Hi Bob,

    1) After he does Place Order I assume he immediately sees the order ID. correct?

    Yes, place order synchronously creates the order in the db, gets the order id, enqueues the order and commits the transaction. The order ID is returned to the client in some form (may be kept in the session if you do not want to send order IDs across the wire, or obfuscated to prevent enumeration-based attacks).

    2) can he then navigate to another page to do some other processing or does he have to stay on the Place Order page until the transaction completes?

    He can navigate away without any problems. I would imagine that most people would stay on the page to see the response, but async processing allows the users to go away or close the browser.

    2.1) if he can navigate away then at what point does the system fwd the user to the order report page? What is the user experience?

    if the user navigates away, they are never forwarded to the order report page. there is nothing to poll the order status and show results. They can, however, go to their orders page and look at past orders. Most sites would send a confirmation e-mail when the order is processed as well.

    2.2) if he cannot navigate then isn’t the elapsed time that the user must spend on the Place Order page the same, or even longer, than if he can navigate away.

    The async architecture does not shorten the time that a user looks at order processing page. It is there to help the server. So if the order processing takes 20 seconds, the user will still look at it for 20 seconds, or even a few seconds longer because of the poll frequency. But when you look at it from the server side, instead of a single order being taken in those 20 seconds, you can have hundreds of orders coming in with the same hardware resources. If you process the request synchronously, a single request will hold on to the database connection, server thread and other key resources for those 20 seconds, blocking all other requests that might want to use those resources. If you process the request asynchronously, and poll for result every 4 seconds, instead of one long request that same user will issue five quick ones. Meanwhile, hundreds of other requests would be able to utilise that database connection and complete within those 20 seconds.

    Do you have an example of a commercial website that behaves in the manner you describe above? Somewhere that I can make an inexpensive purchase in order to see the functionality.

    This is a private blog, so I cannot really disclose the names of any of my clients. But in general, if you go to the web site of any larger UK bookmaker, there is a high chance that you will see such processing.

  8. When looking in to the bandwidth, mostly we consider the capacity of the server backbone connectivities. Just providing high capacity server connectivities is often not enough for most of the publicly available applications.

    Imagine that some of your users connecting via really low bandwidth lines such as dial ups, mobile devices, etc. Then size of your pages (content size) can be a performance killer since every request will take more time than you originally planned. These long running requests will eat up the available connections making system slower for others. This is another important aspect we should look at when designing. Keep your pages less complex with most used content. Rest of the content can be made available with drill-down links.

  9. Assume web client submits a request that, once at the server, submits a request for along running TX to an external system (credit card authorization). I assume the flow is something like:

    1. browser sends request to server
    2. server:
    2a creates some type of request-id
    2b stores request (or whatever it needs to) in DB
    2c puts request in queue to be processed by external system
    2d returns to client sending request-id
    4. At browser, user sees request-id and progress indicator “transaction being processed, please wait”
    4a. At this point I’d probably be using ASP.Net’s callback mechanism
    5. At browser, one of the following happens:
    5a. Request completes successfully and you’re redirected to success page
    5b. Request completes in error and you’re redirected to error page
    5c. Request times out and you’re redirected to timeout page
    5d. user navigates away and must go to appropriate status page to see what happened
    6. At the same time as 4, the server is polling for completion of request in queue
    7. When it gets completion response back (ok, error, timeout), we resume at 5

    Questions:
    1. Does this sound about right
    2. Any steps that I’ve missed
    3. Do you know of a good article that shows this as a sequence diagram or other design-type construct
    4. If the user has navigated away from the “transaction being processed, please wait” page, how does the server know not to do a redirect
    5. How does the server make the connection (correlation) between the request in the queue and the browser session that may need to be redirected

  10. Hi Bob,

    the server should not be polling anything. The whole idea is to delegate the processing from the web server to a back-end system. so the web server enqueues the request, and forgets about it. The browser polls the server (if the user does not navigate away) and the server then retrieves the operation status. If the status is “in processing”, the browser polls again after a few seconds. if the status is “complete”, then the result is displayed or user gets forwarded to some status page.

    as for your questions 4 and 5, the server does not care. if the user navigated away, they should be able to go to the orders page and see the status of the order (you might also want to send them an e-mail when the order completes). The correlation between browser session and request is established by the browser requesting status of that particular order (sent to the browser in step 2d of your description).

  11. I’m almost there. I think I’m missing just a small bit of the picture re the server to long-running external request processing.

    Here’s the situation:
    1. browser sends request to webserver
    2. app on webserver submits request to external system that may possibly take a while (let’s say possibly 20-30 minutes). Have no control over the external system but have to account for it.
    3. browser polls for a limited time (30 seconds) and after that displays a message telling user to check order status page and maybe redirects to some other page.

    I guess my question has to do with the server thread that made the request of the long running external service and that has now been ‘forgotten about’ by the browser.

    1. Is the request from the server to external system typically synchronous or asynchronous?
    -If sync, the server thread sits and waits.
    -If async, then the server must be able to poll the external process for progress or the external process must provide some type of call back.
    2. What is standard practice, if there is such a thing as standard? What do you see as the more common architecture?

  12. Hi Bob,

    first of all, I would not execute the external request from the web server at all. Ideally, I’d delegate that to another server. Depending on your server architecture (and whether you use a web server or an app server to process web requests), you might be able to launch background threads in the web server itself, but again the goal is to relieve the web servers from as much stress as we can – and delegate long running work to background operational servers.

    Regarding the other part, it really depends on the external service. Some card processor run synchronously, some card processors send you a notification when the transaction has been authorised. (so typically you don’t poll, they call you when the job is done).

  13. Thanks a lot for your feedback. Things are much clearer. My situation is already somewhat as you recommend.

    My current server environment is as follows (3 servers):
    - webserver running IIS and hosting the app components that invoke the long running TX
    - DB server running SQL Server
    - Another server (server X) which runs a third party software package that communicates to external systems. This is what processes the long running TXs. They are processed synchronously.
    - App component on webserver sends request to server X which processes it and returns whenever it’s done. App component then writes results to DB.

    My goal is to keep the webserver as unburdened as possible and keep page response time relatively constant. To do that it appears that I’ll have to implement browser “are you done” polling. The real work would be, and is, done on server X which calls external system but the app thread on the webserver would still be waiting synchronously for the response from server X.

    I could probably wrap the long running TX on server X and invoke it asynchronously from the app on the web server but as the real work is being done on server X and not the webserver I’m not sure if that would buy me much.

  14. Hi gojko,

    Your thoughts in this article is a very good aspect to be considered for architects designing their products. And it has been covered very well under 3 good ideas.

    I have one question:

    1) With regard to asynchronous processing, you have given the example of credit- card authentication. But in the case of payment using internet banking, how can we make it asynchronous? May be this is not an appropriate question, but can that process be made asynchronous?

    Payment thru’ internet banking involves punching into the bank’s site from the main shopping website, logging in with authentication, and processing of your card details and authentication of transaction password details.

    Related to the same topic, I can think of another example like flight booking. Say for example, there is just 1 ticket left for you to book, from Moscow to Chicago. You place an order, an order id is created (as you have given in your example), and the authentication process is delegated. In between the process of creating an order id, and authentication, there could be another thread (user request) which creates another order id for the same ticket. If the second thread is fast enough, the first user cannot get the ticket which is not a desirable situation. Also, there may be a case where the first user’s money may have been debited, but ticket not issued, because the second user used a credit card (instead of internet banking) and his card processing ended sooner.

    Isn’t this whole transaction of booking atomic? Placing an order for a ticket and authentication of the process?

    Or, would we allow only one order id to be created and maintained in the database at a time to avoid this situation?

    Please let me know your thoughts.

  15. Hi Hari,

    the internet banking example you suggested is asynchronus by its nature, since two sites are involved in the workflow. I typically design web wallet payments so that the main merchant site starts a payment operation in the system, then the request returns back to the browser which polls the server for updates. when the backend server has sent all relevant details to the wallet and received back the request confirmation, it updates the transaction status to “ready for redirect”. Client app, in the browser, upon getting that status, reads the information required for redirection and forwards the user to the wallet web site. Upon completion, wallet forwards the user back to the payment status page, where the browser polls the server again periodically for status updates. The wallet system will also typically send the approval to the merchant server, which then updates the transaction status to “complete”. the workflow is a bit different for wallets that do not send backend-to-backend updates but that difference is not really important for this story.

    I’d organise the ticket example by reserving the ticket on your end, then sending off the payment request, and if that request succeeds then confirming the ticket reservation. If the payment fails, then releasing the ticket. Reserving a ticket would be a short atomic DB operation, so the concurrency of that would be controlled by the database (even if two reservation requests come at the same time, the DB will refuse one transaction or block it until the first one finishes). The speed of payment processing then does not matter.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>