At N26, we want to make sure we have resilience and fault tolerance built into our backend service-to-service calls.

Our services used a combination of Hystrix, Retrofit, Retryer, and other tools to achieve this goal. However, Netflix recently announced that Hystrix is no longer under active development.

Therefore, we needed to come up with a replacement solution that maintains the same level of functionality. Since Hystrix provided a big portion of our http client resilience (including circuit breaking, connection thread pool thresholds, easy to add fallbacks, response caching, etc.), we used this announcement as a good opportunity to revisit our entire http client resilience stack. We wanted to find a solution that consolidated our fragmented tooling into an easy-to-use and consistent approach.

This talk will share the approach we are currently implementing and the tools we analyzed while making the decision. Its aim is to provide backend devs (primarily working on JVM languages) and SREs with a comprehensive view on the state of the art for service-to-service call tooling (resilience 4j, envoy, gRPC, retrofit, etc), mechanisms to improve service-to-service call resiliency (timeouts, circuit breaking, adaptive concurrency limits, outlier detection, rate limiting, etc.) and a discussion on where these mechanisms should be implemented (client side, side-car proxy, server-side side-car proxy or server-side).


Comments are closed.

The topic was very interesting, well explained, and the speaker was great. Congratulations!

Jaume at 10:09 on 6 Jun 2019

Best talk so far to me

Javi at 11:15 on 6 Jun 2019

Nice talk

Santi Muñoz at 11:45 on 6 Jun 2019

Amazing speaker and very interesting topic.

Buitaker at 18:56 on 6 Jun 2019

Good dissection and analysis of solutions out there