Formal Specification, Verification, and Implementation of Fault-Tolerant Systems using EventML

Distributed programs are known to be extremely difficult to implement, test, verify, and maintain.  This is due in part to the large number of possible unforeseen interactions among components, and to the difficulty of precisely specifying what the programs should accomplish in a formal language that is intuitively clear to the programmers.  We discuss here a methodology that has proven itself in building a state of the art implementation of Multi-Paxos and other distributed protocols used in a deployed database system.  This article focuses on the basic ideas of formal EventML programming illustrated by implementing a fault-tolerant consensus protocol and showing how we prove its safety properties with the Nuprl proof assistant.

[1]  Nancy A. Lynch,et al.  Using I/O automata for developing distributed systems , 2000 .

[2]  Stephan Merz,et al.  Verifying Safety Properties with the TLA+ Proof System , 2010, IJCAR.

[3]  Patrick Lincoln,et al.  A Formally Verified Algorithm for Interactive Consistency Under a Hybrid Fault Model , 1993, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[4]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[5]  Michael Norrish,et al.  seL4: formal verification of an OS kernel , 2009, SOSP '09.

[6]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[7]  Stephan Merz,et al.  Formal Verification of Consensus Algorithms Tolerating Malicious Faults , 2011, SSS.

[8]  Mark Bickford,et al.  Formal Program Optimization in Nuprl Using Computational Equivalence and Partial Types , 2013, ITP.

[9]  Mark Bickford,et al.  Component Specification Using Event Classes , 2009, CBSE.

[10]  Jeremy Bryans Developing a Consensus Algorithm Using Stepwise Refinement , 2011, ICFEM.

[11]  Robbert van Renesse,et al.  Paxos Made Moderately Complex , 2015, ACM Comput. Surv..

[12]  M. Bickford,et al.  Generating event logics with higher-order processes as realizers , 2011 .

[13]  Xi Wang,et al.  Verdi: a framework for implementing and formally verifying distributed systems , 2015, PLDI.

[14]  Ross A. Knepper,et al.  ROSCoq: Robots Powered by Constructive Reals , 2015, ITP.

[15]  Mark Bickford,et al.  Innovations in computational type theory using Nuprl , 2006, J. Appl. Log..

[16]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[17]  Stephan Merz,et al.  Formal Verification of a Consensus Algorithm in the Heard-Of Model , 2009, Int. J. Softw. Informatics.

[18]  Leslie Lamport,et al.  Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers [Book Review] , 2002, Computer.

[19]  Nancy A. Lynch,et al.  Automated implementation of complex distributed algorithms specified in the IOA language , 2009, International Journal on Software Tools for Technology Transfer.

[20]  Mark Bickford,et al.  The Logic of Events, a framework to reason about distributed systems , 2012 .

[21]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[22]  Pierre Castéran,et al.  Interactive Theorem Proving and Program Development , 2004, Texts in Theoretical Computer Science An EATCS Series.

[23]  Helmut Veith,et al.  Parameterized model checking of fault-tolerant distributed algorithms by abstraction , 2013, FMCAD 2013.

[24]  Stephen J. Garland TIOA User Guide and Reference Manual , 2005 .

[25]  Elena Pagani,et al.  Automated Support for the Design and Validation of Fault Tolerant Parameterized Systems: a case study , 2010, Electron. Commun. Eur. Assoc. Softw. Sci. Technol..

[26]  Mark Bickford,et al.  Developing Correctly Replicated Databases Using Formal Tools , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[27]  Tatsuhiro Tsuchiya,et al.  Using Bounded Model Checking to Verify Consensus Algorithms , 2008, DISC.

[28]  Rance Cleaveland,et al.  Implementing mathematics with the Nuprl proof development system , 1986 .

[29]  André Schiper,et al.  The Heard-Of model: computing in distributed systems with benign faults , 2009, Distributed Computing.