Formal modeling and verification of distributed failure detectors

Model checking is a systematic way of checking the absence of errors in a distributed system, i.e., assessing the functional requirements in a distributed system. However, there are certain challenges in this field, e.g., developing true abstract models and on their basis generalizing/guranteeing results, limited capacity of model checking tools and computational resources, identification of all requirements and their accurate specifications, etc. To understand and face such challenges, it is necessary to apply the prominent model checking techniques to different distributed systems designed for different communication models. In this thesis this challenge is accepted and resultantly encountered issues are discussed/addressed. The results reported are sufficient for advocating the need for applying model checking techniques as debugging. Therefore, we report bugs and the propose fixes but for ambiguous algorithms, we reconstruct them. We model check both fixed and reconstructed algorithms. We assess the following protocols: • Accelerated heartbeat protocols, • Consensus protocols in asynchronous distributed systems, • Group membership protocols and • Efficient algorithms to implement failure detectors in partially synchronous systems. We found that the accelerated heartbeat protocols proposed in [M.G. Gouda and T.M. McGuire, Accelerated Heartbeat Protocols, Proc. Of ICDCS’98], violated some natural and essential properties. We proved the results by giving counterexamples and developed the techniques to address the time-triggered events in mCRL2 and investigated the correct time bounds for all the protocols. Regarding consensus problem, we proved the correctness of the proposed algorithms where the failure detectors are unreliable (i.e., failure detectors may make mistakes). These algorithms are proposed in [T. Deepak Chandra and S. Toueg, Unreliable Failure Detectors for Reliable Distributed Systems, J. ACM, 1996 ]. For the group membership protocols proposed in [Y. Amir, D. Dolev, S. Kramer and D. Malki, Membership Algorithms for Multicast Communication Groups, Springer-Verlag, 1992], we found that the original specifications and the text explaining the protocols can be interpreted in different ways and even some natural interpretations contradict each other. Our formalization with respect to different interpretations showed the violation of claimed properties. So to resolve the ambiguities, we reconstructed the protocols and model-checked them. For analyzing the algorithms proposed in [M. Larrea, S. Arevalo and A.Fernndez, Efficient Algorithms to Implement Unreliable Failure Detectors in Partially Synchronous Systems, Proc. of DISC’99 ], we applied symmetry reduction techniques. We found that every algorithm encounters a deadlock if there is a bounded (yet arbitrarily large) buffer in the communication channel between a pair of nodes. We propose fixes for deadlock avoidance and model check the proposed algorithm in UPPAAL, FDR2 and MCRL2. We also present a comparison of these three tools for model checking one of the given four protocols.

[1]  Karina R. Olmos Joffré Strategies for Context Sensitive Program Transformation , 2009 .

[2]  Mohamed G. Gouda,et al.  Accelerated heartbeat protocols , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[3]  André Schiper,et al.  Improving Fast Paxos: being optimistic with no overhead , 2006, 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06).

[4]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[5]  Bastiaan Stephan Graaf,et al.  Model-Driven Evolution of Software Architectures , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[6]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[7]  Thomas A. Henzinger,et al.  A User Guide to HyTech , 1995, TACAS.

[8]  D. M. Hutton,et al.  The Art of Multiprocessor Programming , 2008 .

[9]  Rajeev Alur,et al.  A Theory of Timed Automata , 1994, Theor. Comput. Sci..

[10]  Tim K. Cocx,et al.  Algorithmic tools for data-oriented law enforcement , 2009 .

[11]  B. J. Arnoldus,et al.  An illumination of the template enigma : software code generation with templates , 2011 .

[12]  Jiannong Cao,et al.  Prediction-Based Multicast Mobility Management in Mobile Internet , 2005, ISPA.

[13]  A. W. Roscoe Understanding Concurrent Systems , 2010, Texts in Computer Science.

[14]  Uwe Nestmann,et al.  Modeling Consensus in a Process Calculus , 2003, CONCUR.

[15]  Hendrik Michaël van der Bijl,et al.  On changing models in model-based testing , 2011 .

[16]  André Schiper,et al.  The Heard-Of model: computing in distributed systems with benign faults , 2009, Distributed Computing.

[17]  André Schiper,et al.  The Heard-Of Model: Unifying all Benign Failures , 2006 .

[18]  Tatsuhiro Tsuchiya,et al.  Using Bounded Model Checking to Verify Consensus Algorithms , 2008, DISC.

[19]  John H. Reif,et al.  Synthesis of Parallel Algorithms , 1993 .

[20]  Emmanuelle Anceaume,et al.  On the Formal Specification of Group Membership Services , 1994 .

[21]  de A. Bruin,et al.  Service-oriented discovery of knowledge : foundations, implementations and applications , 2010 .

[22]  Martijn Warnier,et al.  Language based security for Java and JML , 2006 .

[23]  Sandeep K. Singhal,et al.  Log-based receiver-reliable multicast for distributed interactive simulation , 1995, SIGCOMM '95.

[24]  Jan Friso Groote,et al.  A linear translation from CTL* to the first-order modal μ -calculus , 2011, Theor. Comput. Sci..

[25]  Robin Milner,et al.  Algebraic laws for nondeterminism and concurrency , 1985, JACM.

[26]  Raluca Marin-Perianu,et al.  Wireless Sensor Networks in Motion - Clustering Algorithms for Service Discovery and Provisioning , 2008 .

[27]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[28]  Jan Friso Groote,et al.  Analysis of distributed systems with mCRL2 , 2008 .

[29]  Flaviu Cristian,et al.  Synchronous and Asynchronous Group Communication. , 1996 .

[30]  Werner Vogels World wide failures , 1996, EW 7.

[31]  Mari Antonius Cornelis Dekker,et al.  Flexible Access Control for Dynamic Collaborative Environments , 2009 .

[32]  Ncwm Niels Braspenning Model-based integration and testing of high-tech multi-disciplinary systems , 2008 .

[33]  Laura Brandán Briones,et al.  Theories for Model-based Testing: Real-time and Coverage , 2007 .

[34]  Iris Loeb Natural Deduction, Sharing By Presentation , 2007 .

[35]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[36]  A. L. de Groot,et al.  Practical Automaton proofs in PVS , 2000 .

[37]  Cjf Cas Cremers Scyther : semantics and verification of security protocols , 2006 .

[38]  R. Bakhshi Gossiping Models : Formal Analysis of Epidemic Protocols , 2011 .

[39]  C. J. Boogerd,et al.  Focusing Automatic Code Inspections , 2010 .

[40]  Ricardo Corin,et al.  Analysis Models for Security Protocols , 2006 .

[41]  Radu Mateescu,et al.  CADP 2006: A Toolbox for the Construction and Analysis of Distributed Processes , 2007, CAV.

[42]  Wang Yi,et al.  Uppaal in a nutshell , 1997, International Journal on Software Tools for Technology Transfer.

[43]  Jan Friso Groote,et al.  Verifying a Sliding Window Protocol in µCRL , 2004, AMAST.

[44]  Gerd Behrmann,et al.  Adding Symmetry Reduction to Uppaal , 2003, FORMATS.

[45]  Elisabeth Bauer How and Why Wikipedia Works : An Interview with , 2007 .

[46]  Francisco Vasques,et al.  Formal Verification of a Group Membership Protocol Using Model Checking , 2007, OTM Conferences.

[47]  J. Kwisthout,et al.  The Computational Complexity of Probabilistic Networks , 2009 .

[48]  Jan Friso Groote,et al.  Analysis of a distributed system for lifting trucks , 2003, J. Log. Algebraic Methods Program..

[49]  Martin R. Neuhäußer,et al.  Model checking nondeterministic and randomly timed systems , 2010 .

[50]  Andrew S. Tanenbaum,et al.  Distributed systems: Principles and Paradigms , 2001 .

[51]  M. A. Valero Espada,et al.  Modal Abstraction and Replication of Processes with Data , 2005 .

[52]  Yee Wei Law,et al.  Key management and link-layer security of wireless sensor networks : Energy-efficient attack and defense , 2005 .

[53]  Tom Staijen,et al.  Graph-based Specification and Verification for Aspect-Oriented Languages , 2010 .

[54]  C.-B. Breunesse On JML: topics in tool-assisted verification of Java programs , 2006 .

[55]  A. Rodriguez Yakushev,et al.  Towards Getting Generic Programming Ready for Prime Time , 2009 .

[56]  Hugo Jonker,et al.  Security matters : privacy in voting and fairness in digital exchange , 2009 .

[57]  B J Linney,et al.  What to do next? , 2001, BMJ : British Medical Journal.

[58]  Nancy A. Lynch,et al.  A hundred impossibility proofs for distributed computing , 1989, PODC '89.

[59]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[60]  AJ Arjan Mooij,et al.  Constructive formal methods and protocol standardization , 2006 .

[61]  Dmitri Jarnikov,et al.  QoS framework for video streaming in home networks , 2007 .

[62]  Ali Mesbah,et al.  Analysis and Testing of Ajax-based Single-page Web Applications , 2009 .

[63]  Wan Fokkink,et al.  Adapting the UPPAAL model of a distributed lift system , 2007, FSEN'07.

[64]  Alberto Montresor,et al.  Group Communication in Partitionable Systems: Specification and Algorithms , 2001, IEEE Trans. Software Eng..

[65]  Tiziana Margaria,et al.  Tools and algorithms for the construction and analysis of systems: a special issue for TACAS 2017 , 2001, International Journal on Software Tools for Technology Transfer.

[66]  Tatsuhiro Tsuchiya,et al.  Verification of consensus algorithms using satisfiability solving , 2011, Distributed Computing.

[67]  Marius Adrian Marin,et al.  An Integrated System to Manage Crosscutting Concerns in Source Code , 2008 .

[68]  Kirk Martinez,et al.  Deploying a sensor network in an extreme environment , 2006, IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (SUTC'06).

[69]  B. Gebremichael-Tesfagiorgis,et al.  Expressivity of Timed Automata Models , 2006 .

[70]  Marcin Paprzycki,et al.  Distributed Computing: Fundamentals, Simulations and Advanced Topics , 2001, Scalable Comput. Pract. Exp..

[71]  T. D. Vu,et al.  Semantics and applications of process and program algebra , 2007 .

[72]  Rachid Guerraoui,et al.  Encapsulating Failure Detection: From Crash to Byzantine Failures , 2002, Ada-Europe.

[73]  Danny Dolev,et al.  Authenticated Algorithms for Byzantine Agreement , 1983, SIAM J. Comput..

[74]  Peter Y. A. Ryan,et al.  The modelling and analysis of security protocols: the csp approach , 2000 .

[75]  Tingting Han,et al.  Diagnosis, Synthesis and Analysis of Probabilistic Models , 2009, Ausgezeichnete Informatikdissertationen.

[76]  G Giovanni Russello,et al.  Separation and adaptation of concerns in a shared data space , 2006 .

[77]  Bahareh Badban,et al.  Verification Techniques for Extensions of Equality Logic , 2006 .

[78]  A. Morali,et al.  IT architecture-based confidentiality risk assessment in networks of organizations , 2011 .

[79]  Muhammad Atif,et al.  Formal verification of Unreliable Failure Detectors in Partially Synchronous Systems , 2012, SAC '12.

[80]  Louise E. Moser,et al.  Byzantine Fault Detectors for Solving Consensus , 2003, Comput. J..

[81]  Fred Kröger,et al.  Temporal Logic of Programs , 1987, EATCS Monographs on Theoretical Computer Science.

[82]  M. J. de Mol,et al.  Reasoning about functional programs : Sparkle, a proof assistant for Clean , 2009 .

[83]  Sam Toueg,et al.  Distributed agreement in the presence of processor and communication faults , 1986, IEEE Transactions on Software Engineering.

[84]  Jan Friso Groote,et al.  Model-checking processes with data , 2005, Sci. Comput. Program..

[85]  Danny Dolev,et al.  On the minimal synchronism needed for distributed consensus , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[86]  Sebastiaan Gijsbert Marinus Cornelissen,et al.  Evaluating Dynamic Analysis Techniques for Program Comprehension , 2009 .

[87]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[88]  Dirk Riehle How and why Wikipedia works: an interview with Angela Beesley, Elisabeth Bauer, and Kizu Naoko , 2006, WikiSym '06.

[89]  P. Zoeteweij,et al.  Composing constraint solvers , 2005 .

[90]  Daniel Marcos Chapiro,et al.  Globally-asynchronous locally-synchronous systems , 1985 .

[91]  Yair Amir,et al.  Membership Algorithms for Multicast Communication Groups , 1992, WDAG.

[92]  Lacramioara Astefanoaei,et al.  An executable theory of multi-agent systems refinement , 2011 .

[93]  Mikel Larrea,et al.  Efficient Algorithms to Implement Unreliable Failure Detectors in Partially Synchronous Systems , 1999, DISC.

[94]  Uwe Nestmann,et al.  Distributed Consensus, revisited , 2007, Acta Informatica.

[95]  Mohammad Ali Abam New data structures and algorithms for mobile data , 2007 .

[96]  Dexter Kozen,et al.  RESULTS ON THE PROPOSITIONAL’p-CALCULUS , 2001 .

[97]  Mohamed G. Gouda,et al.  The Austin Protocol Compiler (Advances in Information Security) , 2004 .

[98]  Arjen van Weelden,et al.  Putting Types To Good Use , 2007 .

[99]  Sjoerd Cranen,et al.  Reconstruction and verification of group membership protocols , 2010 .

[100]  van Mpwj Michiel Osch Model-based testing of hybrid systems , 2007 .

[101]  Mohamed G. Gouda,et al.  Alert communication primitives above TCP , 2000, J. High Speed Networks.

[102]  Jan Friso Groote,et al.  Large State Space Visualization , 2003, TACAS.

[103]  M Muhammad Atif Formal analysis of consensus protocols in asynchronous distributed systems , 2009 .

[104]  Idit Keidar,et al.  Group communication specifications: a comprehensive study , 2001, CSUR.

[105]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[106]  Jun Pang,et al.  Model Checking Round-Based Distributed Algorithms , 2010, 2010 15th IEEE International Conference on Engineering of Complex Computer Systems.

[107]  José Proença,et al.  Synchronous Coordination of Distributed Components , 2011 .

[108]  Muhammad Atif,et al.  Formal specification and analysis of accelerated heartbeat protocols , 2010, SummerSim.

[109]  Michael K. Reiter,et al.  Unreliable intrusion detection in distributed computations , 1997, Proceedings 10th Computer Security Foundations Workshop.

[110]  Krzysztof R. Apt,et al.  Limits for Automatic Verification of Finite-State Concurrent Systems , 1986, Inf. Process. Lett..

[111]  Jaco van de Pol,et al.  Verification of JavaSpaces TM Parallel Programs , 2003 .

[112]  Ling Cheung,et al.  Reconciling nondeterministic and probabilistic choices , 2006 .

[113]  Eu-Jin Goh,et al.  Searching on Encrypted Data , 2003 .

[114]  Miguel E. Andrés,et al.  Quantitative Analysis of Information Leakage in Probabilistic and Nondeterministic Systems , 2011, ArXiv.

[115]  Magiel Bruntink,et al.  Renovation of idiomatic crosscutting concerns in embedded systems , 2005 .

[116]  R. Boumen,et al.  Integration and test plans for complex manufacturing systems , 2007 .

[117]  Nicola Santoro Design and Analysis of Distributed Algorithms (Wiley Series on Parallel and Distributed Computing) , 2006 .

[118]  Radu Mateescu,et al.  CADP 2011: a toolbox for the construction and analysis of distributed processes , 2012, International Journal on Software Tools for Technology Transfer.

[119]  Eelco Dolstra,et al.  The purely functional software deployment model , 2006 .

[120]  Aleta Marie Ricciardi,et al.  The Group Membership Problem in Asynchronous Systems , 1993 .

[121]  Edsger W. Dijkstra,et al.  Solution of a problem in concurrent programming control , 1965, CACM.

[122]  Alain Kerbrat,et al.  CADP - A Protocol Validation and Verification Toolbox , 1996, CAV.

[123]  Aad Mathssen,et al.  Logical Calculi for Reasoning with Binding , 2008 .

[124]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[125]  J. Marchant In search of lost time , 2006, Nature.

[126]  Jan Friso Groote,et al.  From µCRL to mCRL2: Motivation and Outline , 2006, Electron. Notes Theor. Comput. Sci..

[127]  Marcel Verhoef,et al.  Modeling and validating distributed embedded real-time control systems , 2009 .

[128]  Cfj Christian Lange,et al.  Assessing and improving the quality of modeling : a series of empirical studies about the UML , 2007 .

[129]  Jurgen Vinju,et al.  Analysis and transformation of source code by parsing and rewriting , 2005 .

[130]  Sergio Yovine,et al.  Model Checking Timed Automata , 1996, European Educational Forum: School on Embedded Systems.

[131]  Fred B. Schneider,et al.  Byzantine generals in action: implementing fail-stop processors , 1984, TOCS.

[132]  Gil Neiger,et al.  Automatically Increasing the Fault-Tolerance of Distributed Algorithms , 1990, J. Algorithms.

[133]  Nicola Santoro,et al.  Design and analysis of distributed algorithms , 2006, Wiley series on parallel and distributed computing.

[134]  Tomas Krilavicius,et al.  Hybrid Techniques for Hybrid Systems , 2006 .

[135]  Faith Ellen,et al.  Hundreds of impossibility results for distributed computing , 2003, Distributed Computing.

[136]  Jaco van de Pol JITty: A Rewriter with Strategy Annotations , 2002, RTA.

[137]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[138]  Vineet Kahlon,et al.  Parameterized Model Checking of Ring-Based Message Passing Systems , 2004, CSL.

[139]  Scw Bas Ploeger,et al.  Improved verification methods for concurrent systems , 2009 .

[140]  Hasan Sözer,et al.  Architecting Fault-Tolerant Software Systems , 2009 .

[141]  Thomas Bäck,et al.  Mixed-integer evolution strategies for parameter optimization and their applications to medical image analysis , 2005 .

[142]  Frits W. Vaandrager,et al.  Analysis of the zeroconf protocol using UPPAAL , 2006, EMSOFT '06.

[143]  Angelika Mader,et al.  Modal µ-Calculus, Model Checking and Gauß Elimination , 1995, TACAS.

[144]  Joseph Y. Halpern,et al.  Message-optimal protocols for Byzantine Agreement , 1993, Mathematical systems theory.

[145]  Peter Verbaan,et al.  The Computational Complexity of Evolving Systems , 2006 .

[146]  Leslie Lamport,et al.  Interprocess Communication , 2020, Practical System Programming with C.

[147]  H. Hansen Coalgebraic Modelling : Applications in Automata theory and Modal logic , 2009 .

[148]  Jos C. M. Baeten,et al.  Process Algebra , 2007, Handbook of Dynamic System Modeling.

[149]  Y. Ting,et al.  Implementation and evaluation of failsafe computer-controlled systems , 2002 .

[150]  Jan Friso Groote,et al.  Search algorithms for automated validation , 2009, J. Log. Algebraic Methods Program..

[151]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[152]  Jan Friso Groote,et al.  Linearization in parallel pCRL , 2000, J. Log. Algebraic Methods Program..

[153]  Michael J. Fischer,et al.  The Consensus Problem in Unreliable Distributed Systems (A Brief Survey) , 1983, FCT.

[154]  Bard Bloom,et al.  Constructing two-writer atomic registers , 1987, PODC '87.

[155]  Yair Amir,et al.  Transis: a communication subsystem for high availability , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.