Efficient dependency tracking for relevant events in concurrent systems

In a concurrent system with N processes, vector clocks of size N are used for tracking dependencies between the events. Using vectors of size N leads to scalability problems. Moreover, association of components with processes makes vector clocks cumbersome and inefficient for systems with a dynamic number of processes. We present a class of logical clock algorithms, called chain clock, for tracking dependencies between relevant events based on generalizing a process to any chain in the computation poset. Chain clocks are generally able to track dependencies using fewer than N components and also adapt automatically to systems with dynamic number of processes. We compared the performance of Dynamic Chain Clock (DCC) with vector clock for multithreaded programs in Java. With 1 % of total events being relevant events, DCC requires 10 times fewer components than vector clock and the timestamp traces are smaller by a factor of 100. For the same case, although DCC requires shared data structures, it is still 10 times faster than vector clock in our experiments. We also study the class of chain clocks which perform optimally for posets of small width and show that a single algorithm cannot perform optimally for posets of small width as well as large width.

[1]  Paul A. S. Ward A framework algorithm for dynamic, centralized dimension-bounded timestamps , 2000, CASCON.

[2]  R. P. Dilworth,et al.  A DECOMPOSITION THEOREM FOR PARTIALLY ORDERED SETS , 1950 .

[3]  Achour Mostefaoui,et al.  Reduction of timestamp sizes for causal event ordering , 1996 .

[4]  Willy Zwaenepoel,et al.  Causal distributed breakpoints , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[5]  Robert E. Strom,et al.  Optimistic recovery in distributed systems , 1985, TOCS.

[6]  John M. Mellor-Crummey,et al.  On-the-fly detection of data races for programs with nested fork-join parallelism , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[7]  Marina Papatriantafilou,et al.  Adaptive plausible clocks , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[8]  Vijay K. Garg,et al.  Detection of Unstable Predicates in Distributed Programs , 1992, FSTTCS.

[9]  Friedemann Mattern,et al.  Algorithms for distributed termination detection , 1987, Distributed Computing.

[10]  Barton P. Miller,et al.  Detecting Data Races in Parallel Program Executions , 1989 .

[11]  SenKoushik,et al.  Runtime safety analysis of multithreaded programs , 2003 .

[12]  LamportLeslie Time, clocks, and the ordering of events in a distributed system , 1978 .

[13]  Edith Schonberg,et al.  An empirical comparison of monitoring algorithms for access anomaly detection , 2011, PPOPP '90.

[14]  Colin J. Fidge,et al.  Timestamps in Message-Passing Systems That Preserve the Partial Ordering , 1988 .

[15]  Kenneth P. Birman,et al.  The process group approach to reliable distributed computing , 1992, CACM.

[16]  H. Kierstead Recursive Colorings of Highly Recursive Graphs , 1981, Canadian Journal of Mathematics - Journal Canadien de Mathematiques.

[17]  Ajay D. Kshemkalyani,et al.  An Efficient Implementation of Vector Clocks , 1992, Inf. Process. Lett..

[18]  Mustaque Ahamad,et al.  Plausible Clocks: Constant Size Logical Clocks for Distributed Systems , 1996, WDAG.

[19]  Koushik Sen,et al.  Runtime safety analysis of multithreaded programs , 2003, ESEC/FSE-11.

[20]  Brian A. Davey,et al.  An Introduction to Lattices and Order , 1989 .

[21]  Stefan Felsner On-Line Chain Partitions of Orders , 1997, Theor. Comput. Sci..

[22]  Murat Demirbas,et al.  Resettable vector clocks , 2000, PODC '00.

[23]  Friedemann Mattern,et al.  Virtual Time and Global States of Distributed Systems , 2002 .

[24]  David B. Johnson,et al.  Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing , 1988, J. Algorithms.

[25]  Vijay K. Garg,et al.  Timestamping messages in synchronous computations , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[26]  Vijay K. Garg,et al.  String realizers of posets with applications to distributed computing , 2001, PODC '01.

[27]  André Schiper,et al.  The Causal Ordering Abstraction and a Simple Way to Implement it , 1991, Inf. Process. Lett..

[28]  Ozalp Babaoglu,et al.  Consistent global states of distributed systems: fundamental concepts and mechanisms , 1993 .

[29]  Keith Marzullo,et al.  Efficient detection of a class of stable properties , 1994, Distributed Computing.

[30]  Keith Marzullo,et al.  Consistent detection of global predicates , 1991, PADD '91.

[31]  Mikko H. Lipasti,et al.  Verifying sequential consistency using vector clocks , 2002, SPAA '02.

[32]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[33]  Koenraad Audenaert,et al.  Clock Trees: Logical Clocks for Programs with Nested Parallelism , 1997, IEEE Trans. Software Eng..

[34]  Bernadette Charron-Bost,et al.  Concerning the Size of Logical Clocks in Distributed Systems , 1991, Inf. Process. Lett..

[35]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[36]  Barton P. Miller,et al.  Optimal tracing and replay for debugging message-passing parallel programs , 1992, Proceedings Supercomputing '92.