If there was a bestseller chart for buzzwords, Serverless would currently be at the top. The interwebs are full of heated debates on how it’s the hottest new thing since the Sun, and rebuttals that it’s just a return to old two-tier architectures. The only thing attracting more controversy than the technology is the name. Twitter is buzzing with sarcastic comments about how serverless involves more servers than ever. There are ideas to stop talking about serverless and start talking about servicefull, then there’s also a half-serious proposal to rename the whole thing to Jeff.

If you’ve been sleeping under a rock for the last year, Serverless (or Jeff) is the next step in the journey from physical hosting to software in the cloud. The most common definition is that it allows you to set up a piece of code to be executed as a reaction to an event. With an elevator pitch as exciting as waiting for a delayed train, it’s no wonder that the most common reaction is ‘So what?’. Serverless, at first, might seem as one of those silly innovations that recycle old ideas and append ‘… in the cloud’. And it’s no wonder that people are scratching their heads at all the fanboy excitement. Waiting for a delayed train, even if in the cloud, is still boring as hell. Anyone working in financial systems will yawn and say that they had message queues decades ago. Oracle database developers will be all smug and say that they had triggers even before that. Data grids, such as Coherence or Gigaspaces provided all that, in the cloud, ten years ago.

All that talk might be quite amusing, but it’s missing a crucial point: financial incentives. The technical capabilities of serverless might not be that exciting, but the financial side sure is. It’s the first time, at least in twenty years I’ve been making software for money, that a deployment architecture actually creates strong financial incentives for good design practices, and clear financial penalties for bad design. And that, for me, is the thing that’s really revolutionary about serverless. The finance is much more important than the tech!

How financial incentives make or break architectures

In the olden days before dotcom booms and busts, companies mostly operated their own hardware and hosted systems on their premises. There was nothing quite like the smell of a new Sun server in the morning. System administrators literally held the keys to the kingdom. Then, some time around year 10 Before Cloud, collocation became a thing. Smaller companies could just rent a few rack spots in a large data centre instead of operating their own. Soon after, someone figured out that it’s a hassle to bring machines in and out of data centres, and offered renting hardware as well. Small operators didn’t need to worry about buying, fixing and replacing machines any more. The next big breakthrough came from understanding that people didn’t necessarily need all the resources of their physical machines all the time. Someone figured out a way to parcel out virtual computing capacity and rent it out, and soon after, a bookstore really nailed how to do that at scale. Jeff Bezos said ‘Let there be Cloud’. But in the year 1 of Amazon Dominance, it was all messy and painful. Virtual machines would get stuck, but there were no LED displays to tell you about it. Load balancing, capacity prediction and failovers became way more complicated than with physical hardware. Somewhere around the year 5 A.D. providers caught up, and started to rent out platforms instead of virtual machines. Google App Engine and Heroku allowed software companies to focus just on building the server processes, and not worry too much about scaling, monitoring, restarting and balancing. But for all the technical revolution during that period, the financial incentives mostly stayed the same.

When I worked with companies running stuff on their own hardware, we cared a lot about keeping that hardware busy. A process that didn’t kick off often, such as a nightly job, would never get a machine just for itself. We’d bundle a bunch of those things and put them on the same box. Payment processors for different providers would run on the same set of machines, so we could easily secure them, monitor them, and keep spare capacity in case of spikes. Bundling stuff together was the only reasonable way of running everything at reasonable cost.

The early cloud changed a lot, but the incentives to bundle stuff together survived. In 2008 and 2009, I had the pleasure of working with an amazing team on a multiplayer gaming system, deployed to the cloud. We rented virtual machines from Amazon and paid for each hour a VM runs, so we cared a lot about keeping those VMs busy. Something that needed to run for only a few minutes per hour would never get its own virtual machine. Provisioning was cheap and, compared to the age before cloud, lightning fast. But it still took a good few minutes to get a new VM up and running, so we had to keep spare capacity reserved for failovers. To avoid wasting money, we ran a bunch of secondary services on failover virtual machines all the time.

Renting platforms instead of VMs was amazing, but even then the incentives to bundle stayed. The first version of MindMup went live on Heroku, and boy was that easier than managing on the early cloud. We no longer cared about scaling up and down, and didn’t need to write any infrastructure operations software. Virtual machines were out, Heroku Dynos were in, though I did not really grasp the difference between the two. We rented dynos from Heroku, and cared a lot about keeping those dynos busy. MindMup supports dozens of conversion formats. Some are used a lot, such as PDF, but some are only executed a few times a day, such as markdown outlines. If we rented a separate dyno for each of them, plus at least one for failover, we’d have to pay an order of magnitude more for running the system than if all those services were bundled together. Keeping each service in its own dyno would make the maintenance of the whole thing would just crazy. So we bundled things together.

For decades, all the books on good software design talked about building modular code and keeping unrelated modules loosely coupled, but the deployment architectures fought us on that. The financial incentives, from hardware servers to virtual machines to platforms in the cloud, stayed the same. Developers are in control of the server processes, and have to reserve computing capacity for server apps. Microservices, service oriented architectures and modularization in general cause apps to become a bunch of building blocks, most of which are not necessarily on the critical execution path. Secondary services may sit quietly a lot, but they still require reserved computing capacity, and a bit more reserved for failover. Services that need to handle spikes require even more reserved capacity for load balancing. When something is reserved, you have a clear financial incentive to keep it busy. Small modules get bundled into server apps, where they can share reserved capacity. Apps get deployed to the same virtual machines, where they can share capacity. And just like that, those decoupled modular services start to potentially influence and disrupt eachother.

Here’s a trivial example from MindMup: some export processes are memory hogs, such as the PDF one. Others are not memory hungry at all, for example the SVG one. Some are very popular, but there is also a long tail of export formats that are not needed frequently. Reserving a separate set of dynos on Heroku for each format would make no sense financially. So we built a single server process, which waited for messages on a task queue, and then invoked the right exporter depending on the message. We could then scale that process easily to increase export capacity and handle spikes. But a stupid bug in an unimportant exporter caused those dynos to get stuck, disrupting everyting. Stupid mistake, and we should have known better, but the financial incentives pushed us to bundle everything together.

Theoretically, containers solve this problem. Instead of bundling services in the same dyno or virtual machine, we could have deployed each service into a separate container, and then bundled a bunch of containers into virtual machines. That would give us easier scaling, faster response times, commercially effective use of reserved VM capacity, and isolation very close to what a full OS would provide. But using a container farm, even if deployed on a cloud, would force us to take a huge step back to managing our own infrastructure. To do it well for a massive app at scale, we’d have to spend time on orchestration tooling, bundling containers, managing environments and a bunch of other things. We’d have to build our own Heroku on top of AWS. Then Lambda came along and solved this problem.

How Lambda changed the game

AWS Lambda effectively implemented the container farm idea – small, isolated services, each in its own container, just without the management headache. We can just wire up a small isolated task to run when an event happens, such as someone requesting a markdown outline export. Lambda will provision, monitor, scale, reuse and restart containers. It provides isolation almost as good as if everything was in a separate VM, and runs it almost as quickly as if we reserved a separate server app for each service. But we don’t have to reserve anything, which is the key.

That’s where the ‘less’ part of serverless comes in. Sure, there is a server process waiting for incoming packets on some TCP port, but my task doesn’t care about that. It is serverless the same way WiFi is wireless. At some point, the e-mail I send over WiFi will hit a wire, of course, but I don’t have to pull along a cord with my phone. With serverless technologies, developers no longer have to worry about running a server process that listens on a socket, waits on a queue or dispatches tasks. Now comes the interesting financial aspect: without control over the server process, there’s no point in reserving capacity. When there’s no reserved computing capacity, there’s nothing to worry about keeping busy. We just need to write the part that responds to a user request, and pay only when it actually runs. And sure, we can still bundle different modules together, but there are clear financial incentives against that.

AWS charges for Lambda based on the time a task spends executing, rounded up to 100 milliseconds, multiplied by the configured upper memory limit. So, for example, if we bundled all the exporters for MindMup into a single Lambda function, we’d have to configure it with a highest needed memory limit, so that PDF exports can execute quickly. That means that we’d end up paying five or six times more than we need for SVG exports, that don’t need nearly as much memory. The financial incentives of the platform push us to keep things modular, and break each of those tasks into different Lambda functions. As a result, loose coupling really stays loose.

Another crucial financial change is that Lambda removes the penalty for multi-versioning. Before Lambda, if we wanted to have the old and the new version of a service running side-by-side, we’d have to reserve capacity for both versions, so the costs would typically double. Running a good A/B test would also increase deployment costs, as would experimenting with a feature. Each experiment requires reserved capacity. Sure, we can bundle them together to save money, but that’s a can of worms from a system design perspective. Feature switches and toggles make spaghetti out of our code, and problems in the experimental version can bring down the non-experimental flows as well. So people have to carefully make trade-offs between deployment costs and isolation risks.

With serverless modules, there is no special financial cost to any of that. Dividing users into two groups and directing them to different versions costs exactly the same as keeping all those users on a single version. We only pay for the code when it executes, and in both cases the code will execute the same number of times. Better still, Lambda comes with multi-versioning out of the box. Each deployment gets a different numerical index, and we can always call any version of any function we’ve ever deployed, just by including the version index in the call. Feature experiments become relatively easy, and we don’t have to pollute the code with toggles and switches to save cash.

Brave new world

Sure, Lambda is no silver bullet. Magical infrastructure increases the risk of integration problems, and lack of control over server apps requires re-thinking around sessions and authorization. Configuration becomes a lot more complex. Any technology that solves one set of problems introduces another, it’s just a question of what kind of problems you want to deal with. Serverless platforms, such as Lambda, Whisk, Azure Functions or Google Cloud Functions, are a major step towards leaving infrastructure problems to companies that are much better positioned to deal with them than most other people out there. I’m incredibly excited about all the possibilities that these platforms open up to smaller teams. Technically it may not be anything specially new, but you no longer need the budget of an investment bank to get all the benefits of tiny modules in isolated containers, nor do you have to spend years rewriting Heroku.

So, before dismissing serverless as just another fad buzzword, consider the financial incentives as well. And excuse me for sounding as a fanboy on drugs, but the fact that the platform removes the penalty for loose coupling and multi-versioning, and even provides financial benefits for keeping unrelated things separate, is pretty game-changing.