Don’t be afraid of huge load spikes!

Huge traffic spikes hitting our applications can be very problematic and scary, especially if we’re not prepared for such situations.

The common solution I see being applied is to have some type of auto-scaling strategy. It’s a good and simple enough solution: launch more instances to deal with more traffic and reduce them when the spike is over. But this automation might take some time to kick in, so the thresholds need to account for that as we don’t want to react too late and too close to the hardware limits or risk losing instances in the process and, consequently, data.

Another solution, although it’s not for every type of request, is to have what’s called a Firehose event stream. This means that instead of having variadic loads of requests hitting your application, directly, they are first put in an event stream before being consumed by the application. We basically create a virtual dam, and we can control the flow of incoming work.

Now, I want to be clear: this dam needs to be robust. And it fundamentally changes the underlying problem from the application to the workers that compose this dam. But since those workers have a simpler workflow – take the request and store it in the event stream – they probably can handle more requests per second than the application. So it can probably do more with less.

With this approach, the application won’t hit resource exhaustion because the load of the work is under our control. It won’t need to scale until it can’t keep up with the lag of events to consume or any other SLA or SLO we have to guarantee.

But, as said before, this is not for every type of request. If you need to deliver a response immediately, it’s an incompatible solution. And that, alongside the additional infrastructure requirements of adding an event or message broker and the actual workers, are pretty much the trade-offs implied.