Made Manifesto

IX. Cache First! (Ask Questions Later)

November 22, 2017

There are only two hard things in Computer Science: cache invalidation and naming things

Phil Karlton, 1997

“It looks OK to me. Have you tried emptying your cache?” — Your web designer, five minutes ago.

In 2009, due to some contractual oversight, Made Media was given responsibility for hosting a companion website for one of the UK’s most popular network TV shows. In fact, said website was the most popular website for Channel 4 in the UK, partly because the show made a feature of directing people to the website, and feeding back from it, live on-air. We can testify to the traffic generated when a popular TV doctor, at prime time, points to a web page featuring a gallery of certain body parts, and suggests you might want to go online to compare yourself. The bandwidth graph has to 10X its scale axis in a second. The data centre phones you up to find out what’s happening.

It was a badass server. It probably had almost as much processing power as your phone today.

At the time, we were only just dipping our toes into Amazon Web Services. What we did have though, was a rack of web servers in a data centre in Manchester. We could dedicate one of them to this website. It was a badass server. It probably had almost as much processing power as your phone today. There was no load test. There was no backup plan.

I don’t know what we thought we would do, maybe throw water over the server if it caught fire.

What we did, was to read some articles about withstanding a Denial of Service Attack. And number one on the list was to use a Reverse Proxy Cache. The Open Source one that everyone was recommending was called Varnish. We installed it, made a few optimisations to the web server configuration, and crossed our fingers. Actually, two of us drove up the M6 to Manchester. I don’t know what we thought we would do, maybe throw water over the server if it caught fire.

 

The website stood up just fine on that single server, due to the magic of caching. In these days of cloud computing and horizontally scaling applications, caching can get overlooked, but it’s still critical to keeping websites operational.

The truth is most web pages do not change all that much. And when they do, they often change within predictable time ranges. Even if you’re only able to cache the information on a page for one second, that is still huge, when you’re doing more than 2,000 requests per second, like - for example - in the middle of an Ed Sheeran on-sale.

Most web assets are actually cacheable, whether that’s a home page, an on-sale button, a seat-availability feed, a piece of javascript, or a performance calendar. And you can control, quite precisely, when a cache expires, whether that’s by specifying ‘this content is good for 60 seconds’ or ‘this status expires at 10:00am exactly, when this event goes on sale’. In fact, HTTP caching is really at the core of the infrastructure of the internet. The much vaunted Content Delivery Networks are really just a network of reverse-proxy caching servers, like Varnish. The thing is, whilst most web developers have a reasonable grasp of HTTP caching, they don’t often put the work in to make sure it’s really working as intended, because it’s kind of a drag.

We design caching considerations into the application architecture, rather than tacking it on at the end.

The worst thing you can do, if you’re responsible for a high-demand application, is to fuse your highly cacheable content (like performance availability), with your hard-to-cache content (like a user’s personal data). Yoking these business concerns together has the side effect of making your entire application largely un-cacheable. It doesn’t matter how fancy your horizontally scaling architecture is in this circumstance, because we’ve already demonstrated that caching can achieve performance benefits of a factor of at least 2000. Even if your cloud can scale that fast in a second (spoiler - it can’t), you’ll be paying through the nose for it.

So at Made Media we design caching considerations into the application architecture, rather than tacking on caching at the end. It makes development harder, but if the cache breaks the code, it means it needs re-thinking. This is what we mean by cache first, ask questions later.

Required Reading

  • Varnish a caching HTTP reverse proxy.
  • Fastly a content delivery network (CDN).