Latency: from linear growth to exponential decay with number of backends

We’ve already discussed how do we measure latency and that we use p95, p99, and sometimes p999.

Let’s look at what happens when we have multiple backends. For simplicity, I’m going to use the same latency profile for all backends.

And you can think of these backends as redises.

Sequential access is not great.

Imagine that when request comes we handle it like this:

  1. Some logic
  2. Access microservice
  3. Some extra logic … N. response

eca140fdef760cf451bbd3312db61426.png

In this case, tail latencies will stack up in an expected linear manner.

dea0d741d7b4735c9ea55f70b190870e.png

Can we do anything about it?

Yes, we can! If we can refactor our app to make parallel calls to all downstream microservices in parallel, our IO latency would be capped by the slowest response of a single service!

20cb95c377d371c3dcb250515bbcd2a4.png

In that case, tail latency would grow roughly logarithmically with the number of backends:

a4fcf903c449269d5ffbbc0e7e0d9f92.png

If we look at what sequential IO gives us, the picture becomes clearer: 5e942b66546fadbc6dc1d23b2d360e29.png

It still slower than having just 1 microservice to access, but it’s a very significant positive change.

Can we do something more about it?

Parallel microservices access

Yes, we can! If we can use multiple replicas, we can do parallel calls to all of them, and when we recieve the fastest answer, cancel all other in-flight requests, we would have a vey nice p99

b4e03fcf8c2e2639915c8e6437c8afcc.png

Please also note, that we get the most results by having just a few (3) read replicas.

Improving sequential microservices access

So for cases when we can’t change the sequetial matter of our code, we can add read replicas, and expect some imporovement, if we send requests to all read replicas in parallel, and as soon as we get first one (i.e. the fastest one), we go on.

9fd2e11c22ba295137ea154a1c7cebb4.png

In general, looks like for practical purposes and a single backend, it does quite well at 2 read replicas.

But it’s not great.

70c9b10ead33b3d4afa03b1763b1a54f.png

Improving parallel microservices access

If you call multiple read replicas in parallel, you would get a remarkable improvement in terms of latency.

6f6a2905100b60581705119b2ab17595.png

There are a few points to consider:

  1. We are already fast with parallel calls. Do we need to be faster? Is it worth it?
  2. We likely have replicas for fault tolerance. Can we use them as read-only replicas?
  3. Network overhead. If you are close to network stack saturation on your nodes, does it make sense to double or triple the load?

020bbcd177cf2405973b25bdfcabb690.png