An Empirical Study on the Correctness of Formally Verified Distributed Systems

Recent advances in formal verification techniques enabled the implementation of distributed systems with machine-checked proofs. While results are encouraging, the importance of distributed systems warrants a large scale evaluation of the results and verification practices. This paper thoroughly analyzes three state-of-the-art, formally verified implementations of distributed systems: Iron-Fleet, Verdi, and Chapar. Through code review and testing, we found a total of 16 bugs, many of which produce serious consequences, including crashing servers, returning incorrect results to clients, and invalidating verification guarantees. These bugs were caused by violations of a wide-range of assumptions on which the verified components relied. Our results revealed that these assumptions referred to a small fraction of the trusted computing base, mostly at the interface of verified and unverified components. Based on our observations, we have built a testing toolkit called PK, which focuses on testing these parts and is able to automate the detection of 13 (out of 16) bugs.

[1]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[2]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[3]  Andrea C. Arpaci-Dusseau,et al.  A Study of Linux File System Evolution , 2013, FAST.

[4]  Bor-Yuh Evan Chang,et al.  Boogie: A Modular Reusable Verifier for Object-Oriented Programs , 2005, FMCO.

[5]  Ashutosh Gupta,et al.  Predicate abstraction and refinement for verifying multi-threaded programs , 2011, POPL '11.

[6]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[7]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[8]  George C. Necula,et al.  Minimizing Faulty Executions of Distributed Systems , 2016, NSDI.

[9]  Lauretta O. Osho,et al.  Axiomatic Basis for Computer Programming , 2013 .

[10]  Tianxiang Lu,et al.  Formal Verification of the Pastry Protocol Using TLA + , 2022 .

[11]  Nicolas Christin,et al.  Push-Button Verification of File Systems via Crash Refinement , 2016, USENIX Annual Technical Conference.

[12]  Yu Luo,et al.  Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems , 2014, OSDI.

[13]  Bill Nitzberg,et al.  Distributed shared memory: a survey of issues and algorithms , 1991, Computer.

[14]  Andrew Birrell,et al.  Implementing remote procedure calls , 1984, TOCS.

[15]  Cheng Li,et al.  A study of the internal and external effects of concurrency bugs , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[16]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[17]  Richard J. Lipton,et al.  Hints on Test Data Selection: Help for the Practicing Programmer , 1978, Computer.

[18]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[19]  Cheng Li,et al.  Finding complex concurrency bugs in large multi-threaded applications , 2011, EuroSys '11.

[20]  Martín Abadi,et al.  The existence of refinement mappings , 1988, [1988] Proceedings. Third Annual Information Symposium on Logic in Computer Science.

[21]  Zhong Shao,et al.  End-to-end verification of stack-space bounds for C programs , 2014, PLDI.

[22]  Tianxiang Lu,et al.  Formal Verification of the Pastry Protocol Using \mathrmTLA^+ , 2015, SETTA.

[23]  Amin Vahdat,et al.  Life, death, and the critical transition: finding liveness bugs in systems code , 2007 .

[24]  Barbara Liskov,et al.  Primitives for distributed computing , 1979, SOSP '79.

[25]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[26]  Michael J. Freedman,et al.  Stronger Semantics for Low-Latency Geo-Replicated Storage , 2013, NSDI.

[27]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[28]  Amin Vahdat,et al.  Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code (Awarded Best Paper) , 2007, NSDI.

[29]  Xavier Leroy,et al.  Formal verification of a realistic compiler , 2009, CACM.

[30]  Gil Neiger,et al.  Causal memory: definitions, implementation, and programming , 1995, Distributed Computing.

[31]  Michael Norrish,et al.  seL4: formal verification of an OS kernel , 2009, SOSP '09.

[32]  Manish Mahajan,et al.  Proof carrying code , 2015 .

[33]  Marvin Theimer,et al.  Session guarantees for weakly consistent replicated data , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[34]  Junfeng Yang,et al.  An empirical study of operating systems errors , 2001, SOSP.

[35]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.

[36]  Xuezheng Liu,et al.  D3S: Debugging Deployed Distributed Systems , 2008, NSDI.

[37]  Xi Wang,et al.  Verdi: a framework for implementing and formally verifying distributed systems , 2015, PLDI.

[38]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[39]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[40]  Pamela Zave,et al.  Using lightweight modeling to understand chord , 2012, CCRV.

[41]  Jochen Hoenicke,et al.  Thread modularity at many levels: a pearl in compositional verification , 2017, POPL.

[42]  Adam Chlipala,et al.  Chapar: certified causally consistent distributed key-value stores , 2016, POPL.

[43]  Xi Wang,et al.  Linux kernel vulnerabilities: state-of-the-art defenses and open problems , 2011, APSys.

[44]  Mark Sullivan,et al.  A comparison of software defects in database management systems and operating systems , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[45]  Sidney Amani,et al.  Cogent: Verifying High-Assurance File System Implementations , 2016, ASPLOS.

[46]  Leslie Lamport,et al.  The temporal logic of actions , 1994, TOPL.

[47]  Viktor Kuncak,et al.  CrystalBall: Predicting and Preventing Inconsistencies in Deployed Distributed Systems , 2009, NSDI.

[48]  Nancy A. Lynch,et al.  Using I/O automata for developing distributed systems , 2000 .

[49]  Yingwei Luo,et al.  Failure Recovery: When the Cure Is Worse Than the Disease , 2013, HotOS.

[50]  Zhong Shao,et al.  Compositional certified resource bounds , 2015, PLDI.

[51]  Robert W. Floyd,et al.  Assigning Meanings to Programs , 1993 .

[52]  K. Rustan M. Leino,et al.  Dafny: An Automatic Program Verifier for Functional Correctness , 2010, LPAR.

[53]  George C. Necula,et al.  Safe kernel extensions without run-time checking , 1996, OSDI '96.

[54]  Adam Chlipala,et al.  Using Crash Hoare logic for certifying the FSCQ file system , 2015, USENIX Annual Technical Conference.

[55]  Satoshi Matsushita,et al.  Implementing linearizability at large scale and low latency , 2015, SOSP.

[56]  Haoxiang Lin,et al.  MODIST: Transparent Model Checking of Unmodified Distributed Systems , 2009, NSDI.

[57]  Srinath T. V. Setty,et al.  IronFleet: proving practical distributed systems correct , 2015, SOSP.

[58]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.