Computer programs fail. We do everything we can to prevent, trap, and handle errors before they cause trouble, but there comes a time in every program's life when its assumptions about the universe are sudden ly not true and there is nothing left to do but "assert False".In a hig h-reliability system we must be able to correct whatever the problem is and restart from the last-known-good state of our computation, as quickly as possible, without dropping any input data on the floor.This talk prese nts one approach to precise state checkpointing and restart in a high-volum e data analysis application. Our application, QuietCare (http //www.quietc, is a behavioral telehealth monitoring system currently de ployed in thousands of senior citizens' homes and residential facilities in the US and UK. It is implemented in Python and SQL.The approach will b e of interest to any Python programmer wanting to maximize the reliability of a database-driven application.


Comments are closed.