Formal verification of algorithms for critical systems

We describe our experience with formal, machine- checked verification of algorithms for critical applications, con- centrating on a Byzantine fault-tolerant algorithm for synchro- nizing the clocks in the replicated computers of a digital flight control system. First, we explain the problems encountered in unsynchronized systems and the necessity, and criticality, of fault-tolerant synchronization. We give an overview of one such algorithm, and of the arguments for its correctness. Next, we describe a verification of the algorithm that we performed using our EHDM system for formal specification and verification. We indicate the errors we found in the published analysis of the algorithm, and other benefits that we derived from the verifica- tion. Based on our experience, we derive some key requirements for a formal specification and verification system adequate to the task of verifying algorithms of the type considered. Finally, we summarize our conclusions regarding the benefits of formal verification in this domain, and the capabilities required of verification systems in order to realize those benefits.

[1]  John M. Rushby,et al.  Formal Specification and Verification of a Fault-Masking and Transient-Recovery Model for Digital Flight-Control Systems , 1992, FTRTFT.

[2]  Natarajan Shankar Mechanical Verification of a Schematic Byzantine Clock Synchronization Algorithm , 1991 .

[3]  I. Lakatos,et al.  Proofs and Refutations: Frontmatter , 1976 .

[4]  R GarmanJohn The "BUG" heard 'round the world , 1981 .

[5]  Chris J. Walter,et al.  The MAFT Architecture for Distributed Fault Tolerance , 1988, IEEE Trans. Computers.

[6]  Dale A. Mackall Development and flight test experiences with a flight-crucial digital control system , 1988 .

[7]  R.W. Butler,et al.  Design strategy for a formally verified reliable computing platform , 1991, COMPASS '91, Proceedings of the Sixth Annual Conference on Computer Assurance.

[8]  Natarajan Shankar Mechanical Verification of a Generalized Protocol for Byzantine Fault Tolerant Clock Synchronization , 1992, FTRTFT.

[9]  Robert E. Shostak,et al.  A Practical Decision Procedure for Arithmetic with Function Symbols , 1979, JACM.

[10]  Jaynarayan H. Lala,et al.  FAULT-TOLERANT PARALLEL PROCESSOR , 1991 .

[11]  I. Kleiner Rigor and Proof in Mathematics: A Historical Perspective , 1991 .

[12]  Parameswaran Ramanathan,et al.  Fault-tolerant clock synchronization in distributed systems , 1990, Computer.

[13]  Hermann Kopetz,et al.  Distributed fault-tolerant real-time systems: the Mars approach , 1989, IEEE Micro.

[14]  P. M. Melliar-Smith,et al.  Synchronizing clocks in the presence of faults , 1985, JACM.

[15]  D. A. Mackall AFTI/F-16 digital flight control system experience , 1984 .

[16]  Hans Langmaack,et al.  Formal Techniques in Real-Time and Fault-Tolerant Systems: Third International Symposium Organized Jointly with the Working Group Provably Correct Systems, ProCoS, Lubeck, Germany, September 19-23, 1994 Proceedings , 1993 .

[17]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[18]  Michael Beeson Towards a Computation System Based on Set Theory , 1988, Theor. Comput. Sci..

[19]  Natarajan Shankar,et al.  PVS: A Prototype Verification System , 1992, CADE.

[20]  Robert S. Boyer,et al.  A computational logic handbook , 1979, Perspectives in computing.

[21]  J. Michael Spivey,et al.  The Z notation - a reference manual , 1992, Prentice Hall International Series in Computer Science.

[22]  J. Spencer Ramsey Theory , 1990 .

[23]  I. Lakatos PROOFS AND REFUTATIONS (I)*† , 1963, The British Journal for the Philosophy of Science.

[24]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[25]  Ian J. Hayes,et al.  Specification case studies , 1987 .

[26]  Cliff B. Jones,et al.  On the Usability of Logics which Handle Partial Functions , 1991 .

[27]  W. D. Young,et al.  Verifying the Interactive Convergence Clock Synchronization algorithm Using the Boyer-Moore Theorem Prover , 1992 .

[28]  Fred B. Schneider,et al.  Understanding Protocols for Byzantine Clock Synchronization , 1987 .

[29]  John Rushby,et al.  Formal Verification of a Fault Tolerant Clock Synchronization Algorithm , 1989 .

[30]  J. Goldberg,et al.  SIFT: Design and analysis of a fault-tolerant computer for aircraft control , 1978, Proceedings of the IEEE.

[31]  John R. Garman,et al.  The "BUG" heard 'round the world: discussion of the software problem which delayed the first shuttle orbital flight , 1981, SOEN.