A Game-Theoretic Foundation for the Maximum Software Resilience against Dense Errors

Safety-critical systems need to maintain their functionality in the presence of multiple errors caused by component failures or disastrous environment events. We propose a game-theoretic foundation for synthesizing control strategies that maximize the resilience of a software system in defense against a realistic error model. The new control objective of such a game is called <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives> <inline-graphic xlink:type="simple" xlink:href="wang-ieq1-2510001.gif"/></alternatives></inline-formula>-resilience. In order to be <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives> <inline-graphic xlink:type="simple" xlink:href="wang-ieq2-2510001.gif"/></alternatives></inline-formula>-resilient, a system needs to rapidly recover from infinitely many waves of a small number of up to <inline-formula><tex-math notation="LaTeX">$k$ </tex-math><alternatives><inline-graphic xlink:type="simple" xlink:href="wang-ieq3-2510001.gif"/></alternatives></inline-formula> close errors provided that the blocks of up to <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives> <inline-graphic xlink:type="simple" xlink:href="wang-ieq4-2510001.gif"/></alternatives></inline-formula> errors are separated by short time intervals, which can be used by the system to recover. We first argue why we believe this to be the right level of abstraction for safety critical systems when local faults are few and far between. We then show how the analysis of <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives> <inline-graphic xlink:type="simple" xlink:href="wang-ieq5-2510001.gif"/></alternatives></inline-formula>-resilience problems can be formulated as a model-checking problem of a mild extension to the alternating-time <inline-formula> <tex-math notation="LaTeX">$\mu$</tex-math><alternatives><inline-graphic xlink:type="simple" xlink:href="wang-ieq6-2510001.gif"/> </alternatives></inline-formula>-calculus (AMC). The witness for <inline-formula><tex-math notation="LaTeX">$k$ </tex-math><alternatives><inline-graphic xlink:type="simple" xlink:href="wang-ieq7-2510001.gif"/></alternatives></inline-formula> resilience, which can be provided by the model checker, can be used for providing control strategies that are optimal with respect to resilience. We show that the computational complexity of constructing such optimal control strategies is low and demonstrate the feasibility of our approach through an implementation and experimental results.

[1]  Salil P. Vadhan,et al.  Computational Complexity , 2005, Encyclopedia of Cryptography and Security.

[2]  Anish Arora,et al.  Synthesis of fault-tolerant concurrent programs , 2004, TOPL.

[3]  Michael Dahlin,et al.  Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults , 2009, NSDI.

[4]  Kavita Ravi,et al.  Fate and free will in error traces , 2004, International Journal on Software Tools for Technology Transfer.

[5]  Liuba Shrira,et al.  HQ replication: a hybrid quorum protocol for byzantine fault tolerance , 2006, OSDI '06.

[6]  Amir Pnueli,et al.  Symbolic Controller Synthesis for Discrete and Timed Systems , 1994, Hybrid Systems.

[7]  M. Rabin Decidability of second-order theories and automata on infinite trees , 1968 .

[8]  Dieter Hogrefe,et al.  The CCITT-Specification and Description Language SDL , 1989, Comput. Networks.

[9]  John M. Rushby,et al.  Formal Specification and Verification of a Fault-Masking and Transient-Recovery Model for Digital Flight-Control Systems , 1992, FTRTFT.

[10]  Krishnendu Chatterjee,et al.  Synthesizing robust systems , 2009, 2009 Formal Methods in Computer-Aided Design.

[11]  Alonzo Church,et al.  Logic, arithmetic, and automata , 1962 .

[12]  S. Sieber On a decision method in restricted second-order arithmetic , 1960 .

[13]  Bernd Finkbeiner,et al.  Synthesis of Asynchronous Systems , 2006, LOPSTR.

[14]  Ramakrishna Kotla,et al.  Zyzzyva , 2007, SOSP.

[15]  Dejan Nickovic,et al.  Robustness of Sequential Circuits , 2010, 2010 10th International Conference on Application of Concurrency to System Design.

[16]  Anish Arora,et al.  Closure and Convergence: A Foundation of Fault-Tolerant Computing , 1993, IEEE Trans. Software Eng..

[17]  Michael K. Reiter,et al.  Fault-scalable Byzantine fault-tolerant services , 2005, SOSP '05.

[18]  Thomas A. Henzinger,et al.  Alternating-time temporal logic , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[19]  Rolf Drechsler,et al.  Implementing a multiple-valued decision diagram package , 1998, Proceedings. 1998 28th IEEE International Symposium on Multiple- Valued Logic (Cat. No.98CB36138).

[20]  Wolfgang Thomas,et al.  Finite-State Strategies in Regular Infinite Games , 1994, FSTTCS.

[21]  Neil Immerman,et al.  Number of Quantifiers is Better Than Number of Tape Cells , 1981, J. Comput. Syst. Sci..

[22]  E. Allen Emerson,et al.  From Asymmetry to Full Symmetry: New Techniques for Symmetry Reduction in Model Checking , 1999, CHARME.

[23]  Amir Pnueli,et al.  On the synthesis of a reactive module , 1989, POPL '89.

[24]  Orna Kupfermant,et al.  Synthesis with Incomplete Informatio , 2000 .

[25]  Neil Immerman Nondeterministic Space is Closed Under Complementation , 1988, SIAM J. Comput..

[26]  Rüdiger Ehlers,et al.  How to Handle Assumptions in Synthesis , 2014, SYNT.

[27]  Farn Wang Model-checking fair dense-time systems with propositions and events , 2014, International Journal on Software Tools for Technology Transfer.

[28]  Parameswaran Ramanathan,et al.  Fault-tolerant clock synchronization in distributed systems , 1990, Computer.

[29]  A. Pnueli,et al.  On the Synthesis of an Asynchronous Reactive Module , 1989, ICALP.

[30]  Ufuk Topcu,et al.  Resilience to intermittent assumption violations in reactive synthesis , 2014, HSCC.

[31]  J. R. Büchi On a Decision Method in Restricted Second Order Arithmetic , 1990 .

[32]  Edsger W. Dijkstra A belated proof of self-stabilization , 2005, Distributed Computing.

[33]  Marko Vukolic,et al.  The Next 700 BFT Protocols , 2015, ACM Trans. Comput. Syst..

[34]  Farn Wang,et al.  Rapid Recovery for Systems with Scarce Faults , 2012, GandALF.

[35]  J. R. Büchi,et al.  Solving sequential conditions by finite-state strategies , 1969 .

[36]  Bruce McMillin,et al.  Software engineering: What is it? , 2018, 2018 IEEE Aerospace Conference.

[37]  Niraj K. Jha,et al.  Fault-tolerant computer system design , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[38]  Farn Wang,et al.  Model-Checking Distributed Real-Time Systems with States, Events, and Multiple Fairness Assumptions , 2004, AMAST.

[39]  Anish Arora,et al.  FTSyn: a framework for automatic synthesis of fault-tolerance , 2008, International Journal on Software Tools for Technology Transfer.

[40]  Boris D. Lubachevsky,et al.  An approach to automating the verification of compact parallel coordination programs. I , 2018, Acta Informatica.

[41]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[42]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[43]  Lijun Zhang,et al.  Efficient approximation of optimal control for continuous-time Markov games , 2016, Inf. Comput..

[44]  Éric Rutten,et al.  Automating the addition of fault tolerance with discrete controller synthesis , 2009, Formal Methods Syst. Des..