Monitoring Partially Synchronous Distributed Systems Using SMT Solvers

In this paper, we discuss the feasibility of monitoring partially synchronous distributed systems to detect latent bugs, i.e., errors caused by concurrency and race conditions among concurrent processes. We present a monitoring framework where we model both system constraints and latent bugs as Satisfiability Modulo Theories (SMT) formulas, and we detect the presence of latent bugs using an SMT solver. We demonstrate the feasibility of our framework using both synthetic applications where latent bugs occur at any time with random probability and an application involving exclusive access to a shared resource with a subtle timing bug. We illustrate how the time required for verification is affected by parameters such as communication frequency, latency, and clock skew. Our results show that our framework can be used for real-life applications, and because our framework uses SMT solvers, the range of appropriate applications will increase as these solvers become more efficient over time.

[1]  Scott D. Stoller,et al.  Detecting global predicates in distributed systems with clocks , 1997, Distributed Computing.

[2]  David A. Basin,et al.  Almost Event-Rate Independent Monitoring of Metric Temporal Logic , 2017, TACAS.

[3]  Murat Demirbas,et al.  Precision, Recall, and Sensitivity of Monitoring Partially Synchronous Distributed Systems , 2016, RV.

[4]  Vijay K. Garg,et al.  Detection of global predicates: Techniques and their limitations , 1998, Distributed Computing.

[5]  Vijay K. Garg,et al.  A Distributed Abstraction Algorithm for Online Predicate Detection , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.

[6]  Weiping Zhu,et al.  Predicate Detection in Asynchronous Distributed Systems: A Probabilistic Approach , 2016, IEEE Transactions on Computers.

[7]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[8]  Friedemann Mattern,et al.  Virtual Time and Global States of Distributed Systems , 2002 .

[9]  Murat Demirbas,et al.  Beyond TrueTime : Using AugmentedTime for Improving Spanner , 2013 .

[10]  Yliès Falcone,et al.  Decentralised LTL monitoring , 2016, Formal Methods Syst. Des..

[11]  Murat Demirbas,et al.  Analysis of Bounds on Hybrid Vector Clocks , 2016, IEEE Transactions on Parallel and Distributed Systems.

[12]  Vijay K. Garg,et al.  Detection of Weak Unstable Predicates in Distributed Programs , 1994, IEEE Trans. Parallel Distributed Syst..

[13]  Koushik Sen,et al.  Efficient decentralized monitoring of safety in distributed systems , 2004, Proceedings. 26th International Conference on Software Engineering.

[14]  Yliès Falcone,et al.  Efficient and Generalized Decentralized Monitoring of Regular Languages , 2014, FORTE.

[15]  David L. Mills,et al.  Internet time synchronization: the network time protocol , 1991, IEEE Trans. Commun..

[16]  Mennatallah Hasabelnaby,et al.  Decentralized Runtime Verification of LTL Specifications in Distributed Systems , 2016 .

[17]  Bernadette Charron-Bost,et al.  Concerning the Size of Logical Clocks in Distributed Systems , 1991, Inf. Process. Lett..

[18]  Brian F. Cooper Spanner: Google's globally-distributed database , 2013, SYSTOR '13.

[19]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[20]  Flaviu Cristian,et al.  The Timed Asynchronous Distributed System Model , 1999, IEEE Trans. Parallel Distributed Syst..

[21]  Eric Torng,et al.  Efficient Algorithms for Predicate Detection using Hybrid Logical Clocks , 2017, ICDCN.

[22]  Friedemann Mattern,et al.  Detecting causal relationships in distributed computations: In search of the holy grail , 1994, Distributed Computing.

[23]  Colin J. Fidge,et al.  Timestamps in Message-Passing Systems That Preserve the Partial Ordering , 1988 .

[24]  Murat Demirbas,et al.  Logical Physical Clocks , 2014, OPODIS.

[25]  Keith Marzullo,et al.  Detection of Global State Predicates , 1991, WDAG.