FiLM: A Runtime Monitoring Tool for Distributed Systems

It is well recognized that debugging or testing a distributed system is a great challenge. FiLM is a runtime monitoring tool that can monitor the execution of distributed applications against LTL specifications on finite traces. Implemented within the online predicate checking infrastructure D3S, FiLM models the execution of distributed applications as a trace of consistent global snapshots with global timestamps, and it employs finite automata constructed from LTL specifications to evaluate the trace of distributed systems. We proved that the generated automata accept exactly the traces which satisfy LTL specifications. Our case study shows that FiLM successfully detected an important and intricate liveness bug in a real Paxos implementation.

[1]  Amin Vahdat,et al.  Pip: Detecting the Unexpected in Distributed Systems , 2006, NSDI.

[2]  Koushik Sen,et al.  Efficient decentralized monitoring of safety in distributed systems , 2004, Proceedings. 26th International Conference on Software Engineering.

[3]  Patrice Godefroid,et al.  Model checking for programming languages using VeriSoft , 1997, POPL '97.

[4]  Galen C. Hunt,et al.  Detours: binary interception of Win32 functions , 1999 .

[5]  Gerard J. Holzmann,et al.  The Model Checker SPIN , 1997, IEEE Trans. Software Eng..

[6]  Grigore Rosu,et al.  An Overview of the Runtime Verification Tool Java PathExplorer , 2004, Formal Methods Syst. Des..

[7]  Doron Drusinsky,et al.  The Temporal Rover and the ATG Rover , 2000, SPIN.

[8]  허윤정,et al.  Holzmann의 ˝The Model Checker SPIN˝에 대하여 , 1998 .

[9]  Marcelo d'Amorim,et al.  Efficient Monitoring of omega-Languages , 2005, CAV.

[10]  Muffy Calder,et al.  Symmetry in temporal logic model checking , 2006, CSUR.

[11]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[12]  Bernd Finkbeiner,et al.  Checking Finite Traces Using Alternating Automata , 2004, Formal Methods Syst. Des..

[13]  Dimitra Giannakopoulou,et al.  Automata-based verification of temporal properties on running programs , 2001, Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001).

[14]  Amir Pnueli,et al.  The temporal logic of programs , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[15]  Amin Vahdat,et al.  Life, death, and the critical transition: finding liveness bugs in systems code , 2007 .

[16]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[17]  Volker Stolz,et al.  Temporal Assertions using AspectJ , 2006, Electron. Notes Theor. Comput. Sci..

[18]  Amin Vahdat,et al.  Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code (Awarded Best Paper) , 2007, NSDI.

[19]  Junfeng Yang,et al.  Using model checking to find serious file system errors , 2004, TOCS.

[20]  Xuezheng Liu,et al.  D3S: Debugging Deployed Distributed Systems , 2008, NSDI.

[21]  Wei Lin,et al.  WiDS Checker: Combating Bugs in Distributed Systems , 2007, NSDI.

[22]  Dawson R. Engler,et al.  Model Checking Large Network Protocol Implementations , 2004, NSDI.

[23]  Ion Stoica,et al.  Friday: Global Comprehension for Distributed Replay , 2007, NSDI.

[24]  Dawson R. Engler,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Cmc: a Pragmatic Approach to Model Checking Real Code , 2022 .

[25]  Bernd Finkbeiner,et al.  Checking Finite Traces using Alternating Automata , 2001, Electron. Notes Theor. Comput. Sci..

[26]  Pierre Wolper,et al.  Simple on-the-fly automatic verification of linear temporal logic , 1995, PSTV.