Bugs, corrupt data or performance issues on web applications are often recognized far too late. In the worst case they are reported by the customer, so they probably have already done some serious damage - frustrated the user, made them lose trust or even corrupted their data. Finding these bugs or recognizing them early gets especially hard, if your application makes heavy use of background processes, daemons or cronjobs. They might even throw exceptions that are buried somewhere in the logs, and no one will ever be aware of them, until someone has a look into the log files. I want to show a way out of this misery and provide different solutions in form of practical examples. These will include different levels of monitoring - from simple text logs on the servers up to a fully monitored application including hardware monitoring, extensive metrics, indexed and searchable logs of the whole environment, performance analysis and alerts if something odd happens. I'll show different examples and give ideas when such a fully monitored solution is a good idea, or when a "light monitoring" is applicable.

Comments

Comments are closed.

The speech should be good for people, who want to work with logs a bit wiser.

Sven, you've shown examples of the logs collecting configuration, but not the metrics configuration. I think adding some examples of the very basic triggers comfiguration in the tool you prefer will give your talk greater call to action effect. For example, you may show how to configure trigger that catches fatals from logs.

Thanks Sven for your talk, it was interesting and gave a way to start working with logs. By the way it's more an explanation of where you are now at Shopware...
I think it can be interesting to present where you started, why you choose these metrics... A more personal and deeper analysis of your solution (and not an explanation of how to use the tools you choose).