Do you have services where the owners claim they run at five 9's but you often run into errors? It's very easy and convenient to build metrics at the service level. These often hide a wide array of issues that users might face. Having the right metrics is a key component of building reliable products. This talk goes into the design of these metrics, real world examples to illustrate good/bad designs.


