Safety assurance via on-line monitoring

Abstract.This paper proposes a new approach and new techniques for on-line monitoring of concurrent programs to ensure that some of their safety properties are not violated. The techniques modify erroneous systems, which violate a certain safety property, into new systems which satisfy the safety property. It does so by adding a new layer that controls the scheduling of steps in the system. We formally characterize the relationship between the erroneous and the new system. Safety monitors for mutual-exclusion, $\ell$-exclusion, and the producer-consumer tasks are presented. Proofs for the mutual-exclusion task and the $\ell$-exclusion task are presented to demonstrate the applicability of our approach.

[1]  Boaz Patt-Shamir,et al.  Self-stabilization by local checking and correction , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[2]  Sébastien Tixeuil,et al.  Transient Fault Detectors , 1998, DISC.

[3]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[4]  Vijay K. Garg,et al.  Software Fault Tolerance of Concurrent Programs Using Controlled Re-execution , 1999, DISC.

[5]  Bowen Alpern,et al.  Defining Liveness , 1984, Inf. Process. Lett..

[6]  Nir Shavit,et al.  A bounded first-in, first-enabled solution to the l-exclusion problem , 1994, TOPL.

[7]  Zohar Manna,et al.  Temporal Verification of Reactive Systems , 1995, Springer New York.

[8]  Zohar Manna,et al.  The Temporal Logic of Reactive and Concurrent Systems , 1991, Springer New York.

[9]  Manuel Blum,et al.  Self-testing/correcting with applications to numerical problems , 1990, STOC '90.

[10]  Gil Neiger,et al.  Automatically increasing the fault-tolerance of distributed systems , 1988, PODC '88.

[11]  Matthew K. Franklin,et al.  Self-Testing/Correcting Protocols (Extended Abstract) , 1999, DISC.

[12]  Leslie Lamport,et al.  The mutual exclusion problem: partII—statement and solutions , 1986, JACM.

[13]  Yehuda Afek,et al.  Local Stabilizer , 2002, J. Parallel Distributed Comput..

[14]  Shlomi Dolev,et al.  Self-stabilizing l-exclusion , 2001, Theor. Comput. Sci..

[15]  Moti Yung,et al.  Memory-Efficient Self Stabilizing Protocols for General Networks , 1990, WDAG.

[16]  Willem P. de Roever,et al.  A Proof System for Communicating Sequential Processes , 1980, ACM Trans. Program. Lang. Syst..

[17]  K. Mani Chandy,et al.  Parallel program design - a foundation , 1988 .

[18]  Nancy A. Lynch,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[19]  Shmuel Katz,et al.  Self-stabilizing extensions for meassage-passing systems , 2005, Distributed Computing.

[20]  Leslie Lamport,et al.  Specifying Concurrent Program Modules , 1983, TOPL.