We all know not to poke at alien life forms in another planet, right? But what about metrics, do you know how to pick, measure and draw conclusions from them? In this talk we will cover various Site Reliability Engineering topics, such as SLIs and SLOs while we explore real life examples of defining and implementing metrics in a system with examples using Prometheus, an open-source system monitoring and alert platform, to demonstrate implementation. Let's get back to some real science.


Comments are closed.

Rob Wilson at 15:59 on 6 Oct 2018

Great talk. A lot of information to digest, but really great and a few things I will be applying to an API I’m building for our partners (acronyms are great; SLIs and SLOs ?)

Ken Guest at 17:00 on 6 Oct 2018

Fantastic talk, learnt a lot and this also ties in nicely with the scaling talk earlier re metrics.

Andy Gaskell at 13:38 on 7 Oct 2018

Great talk from Rafael, both high level principles and practical notes on implementation.

Really different from the talk I last saw him give at J & Beyond in Prague a few years ago, so that was an interesting contrast.

Patryk Zajdler at 10:44 on 8 Oct 2018

Great talk. Despite the amount of information it wasn't overwhelming at all.

Richard Black at 10:58 on 8 Oct 2018

Some really useful ideas here, and pitched at just the right level to give everyone something to take away