论文信息 - An Empirical Study on the Correctness of Formally Verified Distributed Systems

An Empirical Study on the Correctness of Formally Verified Distributed Systems

Recent advances in formal verification techniques enabled the implementation of distributed systems with machine-checked proofs. While results are encouraging, the importance of distributed systems warrants a large scale evaluation of the results and verification practices. This paper thoroughly analyzes three state-of-the-art, formally verified implementations of distributed systems: Iron-Fleet, Verdi, and Chapar. Through code review and testing, we found a total of 16 bugs, many of which produce serious consequences, including crashing servers, returning incorrect results to clients, and invalidating verification guarantees. These bugs were caused by violations of a wide-range of assumptions on which the verified components relied. Our results revealed that these assumptions referred to a small fraction of the trusted computing base, mostly at the interface of verified and unverified components. Based on our observations, we have built a testing toolkit called PK, which focuses on testing these parts and is able to automate the detection of 13 (out of 16) bugs.

[1] Maurice Herlihy,et al. Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[2] Yuanyuan Zhou,et al. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[3] Andrea C. Arpaci-Dusseau,et al. A Study of Linux File System Evolution , 2013, FAST.

[4] Bor-Yuh Evan Chang,et al. Boogie: A Modular Reusable Verifier for Object-Oriented Programs , 2005, FMCO.

[5] Ashutosh Gupta,et al. Predicate abstraction and refinement for verifying multi-threaded programs , 2011, POPL '11.

[6] Dawson R. Engler,et al. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[7] Miguel Oom Temudo de Castro,et al. Practical Byzantine fault tolerance , 1999, OSDI '99.

[8] George C. Necula,et al. Minimizing Faulty Executions of Distributed Systems , 2016, NSDI.

[9] Lauretta O. Osho,et al. Axiomatic Basis for Computer Programming , 2013 .

[10] Tianxiang Lu,et al. Formal Verification of the Pastry Protocol Using TLA + , 2022 .

[11] Nicolas Christin,et al. Push-Button Verification of File Systems via Crash Refinement , 2016, USENIX Annual Technical Conference.

[12] Yu Luo,et al. Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems , 2014, OSDI.

[13] Bill Nitzberg,et al. Distributed shared memory: a survey of issues and algorithms , 1991, Computer.

[14] Andrew Birrell,et al. Implementing remote procedure calls , 1984, TOCS.

[15] Cheng Li,et al. A study of the internal and external effects of concurrency bugs , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[16] Nikolaj Bjørner,et al. Z3: An Efficient SMT Solver , 2008, TACAS.

[17] Richard J. Lipton,et al. Hints on Test Data Selection: Help for the Practicing Programmer , 1978, Computer.

[18] John K. Ousterhout,et al. In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[19] Cheng Li,et al. Finding complex concurrency bugs in large multi-threaded applications , 2011, EuroSys '11.

[20] Martín Abadi,et al. The existence of refinement mappings , 1988, [1988] Proceedings. Third Annual Information Symposium on Logic in Computer Science.

[21] Zhong Shao,et al. End-to-end verification of stack-space bounds for C programs , 2014, PLDI.

[22] Tianxiang Lu,et al. Formal Verification of the Pastry Protocol Using \mathrmTLA^+ , 2015, SETTA.

[23] Amin Vahdat,et al. Life, death, and the critical transition: finding liveness bugs in systems code , 2007 .

[24] Barbara Liskov,et al. Primitives for distributed computing , 1979, SOSP '79.

[25] Leslie Lamport,et al. The part-time parliament , 1998, TOCS.

[26] Michael J. Freedman,et al. Stronger Semantics for Low-Latency Geo-Replicated Storage , 2013, NSDI.

[27] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[28] Amin Vahdat,et al. Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code (Awarded Best Paper) , 2007, NSDI.

[29] Xavier Leroy,et al. Formal verification of a realistic compiler , 2009, CACM.

[30] Gil Neiger,et al. Causal memory: definitions, implementation, and programming , 1995, Distributed Computing.

[31] Michael Norrish,et al. seL4: formal verification of an OS kernel , 2009, SOSP '09.

[32] Manish Mahajan,et al. Proof carrying code , 2015 .

[33] Marvin Theimer,et al. Session guarantees for weakly consistent replicated data , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[34] Junfeng Yang,et al. An empirical study of operating systems errors , 2001, SOSP.

[35] Dawson R. Engler,et al. EXE: automatically generating inputs of death , 2006, CCS '06.

[36] Xuezheng Liu,et al. D3S: Debugging Deployed Distributed Systems , 2008, NSDI.

[37] Xi Wang,et al. Verdi: a framework for implementing and formally verifying distributed systems , 2015, PLDI.

[38] Elaine J. Weyuker,et al. Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[39] Yuanyuan Zhou,et al. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[40] Pamela Zave,et al. Using lightweight modeling to understand chord , 2012, CCRV.

[41] Jochen Hoenicke,et al. Thread modularity at many levels: a pearl in compositional verification , 2017, POPL.

[42] Adam Chlipala,et al. Chapar: certified causally consistent distributed key-value stores , 2016, POPL.

[43] Xi Wang,et al. Linux kernel vulnerabilities: state-of-the-art defenses and open problems , 2011, APSys.

[44] Mark Sullivan,et al. A comparison of software defects in database management systems and operating systems , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[45] Sidney Amani,et al. Cogent: Verifying High-Assurance File System Implementations , 2016, ASPLOS.