Planning for change in a formal verification of the raft consensus protocol

We present the first formal verification of state machine safety for the Raft consensus protocol, a critical component of many distributed systems. We connected our proof to previous work to establish an end-to-end guarantee that our implementation provides linearizable state machine replication. This proof required iteratively discovering and proving 90 system invariants. Our verified implementation is extracted to OCaml and runs on real networks. The primary challenge we faced during the verification process was proof maintenance, since proving one invariant often required strengthening and updating other parts of our proof. To address this challenge, we propose a methodology of planning for change during verification. Our methodology adapts classical information hiding techniques to the context of proof assistants, factors out common invariant-strengthening patterns into custom induction principles, proves higher-order lemmas that show any property proved about a particular component implies analogous properties about related components, and makes proofs robust to change using structural tactics. We also discuss how our methodology may be applied to systems verification more broadly.

[1]  Xi Wang,et al.  Verdi: a framework for implementing and formally verifying distributed systems , 2015, PLDI.

[2]  Enrico Tassi,et al.  Asynchronous Processing of Coq Documents: From the Kernel up to the User Interface , 2015, ITP.

[3]  David Mazières Paxos Made Practical , 2007 .

[4]  Adam Chlipala,et al.  Mostly-automated verification of low-level programs in computational separation logic , 2011, PLDI '11.

[5]  Mark Bickford,et al.  Formal Specification, Verification, and Implementation of Fault-Tolerant Systems using EventML , 2015, Electron. Commun. Eur. Assoc. Softw. Sci. Technol..

[6]  Gernot Heiser,et al.  Comprehensive formal verification of an OS microkernel , 2014, TOCS.

[7]  Mark Bickford,et al.  The Logic of Events, a framework to reason about distributed systems , 2012 .

[8]  Butler W. Lampson,et al.  The ABCD's of Paxos , 2001, PODC '01.

[9]  Benjamin C. Pierce,et al.  Types and programming languages: the next generation , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..

[10]  Derek Dreyer,et al.  How to make ad hoc proof automation less ad hoc , 2011, ICFP '11.

[11]  Srinath T. V. Setty,et al.  IronFleet: proving practical distributed systems correct , 2015, SOSP.

[12]  Adam Chlipala,et al.  Certified Programming with Dependent Types - A Pragmatic Introduction to the Coq Proof Assistant , 2013 .

[13]  Mark Bickford,et al.  Developing Correctly Replicated Databases Using Formal Tools , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[14]  Yair Amir,et al.  Paxos for System Builders , 2008 .

[15]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[16]  Barbara Liskov,et al.  Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems , 1999, PODC '88.

[17]  Chun-Kun,et al.  Lecture Note Sel4: Formal Verification of an Os Kernel , 2022 .

[18]  Robbert van Renesse,et al.  Paxos Made Moderately Complex , 2015, ACM Comput. Surv..

[19]  Assia Mahboubi,et al.  Packaging Mathematical Structures , 2009, TPHOLs.

[20]  Vincent Rahli,et al.  Interfacing with Proof Assistants for Domain Specific Programming Using EventML , 2012 .

[21]  Lars Birkedal,et al.  Ynot: dependent types for imperative programs , 2008, ICFP.

[22]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[23]  Enrico Tassi,et al.  A Small Scale Reflection Extension for the Coq system , 2008 .

[24]  Rance Cleaveland,et al.  Implementing mathematics with the Nuprl proof development system , 1986 .

[25]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[26]  Xavier Leroy,et al.  Formal verification of a realistic compiler , 2009, CACM.

[27]  Mark Bickford,et al.  A Logic of Events , 2003 .

[28]  Yair Amir,et al.  Paxos for System Builders: an overview , 2008, LADIS '08.

[29]  Andrew W. Appel,et al.  Program Logics for Certified Compilers , 2014 .

[30]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[31]  D. L. Parnas,et al.  On the criteria to be used in decomposing systems into modules , 1972, Software Pioneers.

[32]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[33]  Lars Birkedal,et al.  Charge! - A Framework for Higher-Order Separation Logic in Coq , 2012, ITP.

[34]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.