Machine-learning systems have become increasingly prevalent in commodity software systems. They are used through cloud-based APIs or embedded through software libraries. However, even ML systems just look like another data pipeline, they make systems sensible and might put systems health at risk without the proper control.

Through discussions with engineers engaged in deploying and operating ML systems, we arrived at a set of principles and best practices. These include from input-data validation, for fairness/quality on training; contextual alerting, deployment and rollback policies to privacy and ethics . We discuss how these practices fit in with established SRE practices, and how ML requires novel approaches in some cases. We look at a few specific cases where ML-based systems did not behave as did traditional systems, and examine the outcomes in light of our recommended best practices.


Comments are closed.

Buitaker at 14:07 on 4 Jun 2019

Add use cases and explain how to take advantage of it in CI/CD

Iva Godo at 09:44 on 5 Jun 2019

Good logical presentation, amazing approach just missed examples to use In a Devops perspective like CI/CD.
Execellent speakers.

Jerry Smith at 09:30 on 6 Jun 2019

Great talk by two awesome speakers.
Showed a theoretical approach to the subject including all the caveats to take into account.
Slides clear with great time control.

Pau Trepat at 09:54 on 6 Jun 2019

Great talk by two awesome speakers. Everything under control: time, talk delivery, message.
Some manager need to see this talk to understand the problem of work with data.

Santi Muñoz at 11:05 on 6 Jun 2019

Amazing talk and speakers, very useful and interesting.

Great presentation. I missed more technical explanation about how the product can be integrated with the user's services, but maybe they might need more time for that...