Scalable timestamp synchronization for event traces of message-passing applications

Event traces are helpful in understanding the performance behavior of message-passing applications since they allow the in-depth analysis of communication and synchronization patterns. However, the absence of synchronized clocks may render the analysis ineffective because inaccurate relative event timings may misrepresent the logical event order and lead to errors when quantifying the impact of certain behaviors. Although linear offset interpolation can restore consistency to some degree, time-dependent drifts and other inaccuracies may still disarrange the original succession of events - especially during longer runs. The controlled logical clock algorithm accounts for such violations in point-to-point communication by shifting message events in time as much as needed while trying to preserve the length of local intervals. In this article, we describe how the controlled logical clock is extended to collective communication to enable the correction of realistic message-passing traces. We present a parallel version of the algorithm scaling to more than thousand processes and evaluate its accuracy by showing that it eliminates inconsistent inter-process timings while preserving the length of local intervals.

[1]  Flaviu Cristian,et al.  Probabilistic clock synchronization , 1989, Distributed Computing.

[2]  Ozalp Babaoglu,et al.  Almost) No Cost Clock Synchronization , 1986 .

[3]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[4]  Andre Heilper,et al.  Clock synchronization in Cell-B.E. traces , 2009 .

[5]  Jean-Marc Jézéquel,et al.  Building a Global Time on Parallel Machines , 1989, WDAG.

[6]  Ursula Hilgers,et al.  Theory and tool for estimating global time in parallel and distributed systems , 1998, Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing - PDP '98 -.

[7]  Emal Pasarly Time , 2011, Encyclopedia of Evolutionary Psychological Science.

[8]  Colin J. Fidge,et al.  Partial orders for parallel debugging , 1988, PADD '88.

[9]  Eric Maillet,et al.  On Efficiently Implementing Global Time for Performance Evaluation on Multiprocessor Systems , 1995, J. Parallel Distributed Comput..

[10]  LamportLeslie Time, clocks, and the ordering of events in a distributed system , 1978 .

[11]  Rolf Rabenseifner Die geregelte logische Uhr, eine globale Uhr für die tracebasierte Überwachung paralleler Anwendungen , 2000 .

[12]  Richard Hofmann Gemeinsame Zeitskala für lokale Ereignisspuren , 1993, MMB.

[13]  Bernd Mohr,et al.  Scalable Parallel Trace-Based Performance Analysis , 2006, PVM/MPI.

[14]  Bernd Mohr,et al.  A Parallel Trace-Data Interface for Scalable Performance Analysis , 2006, PARA.

[15]  Matthias S. Müller,et al.  Internal Timer Synchronization for Parallel Event Tracing , 2008, PVM/MPI.

[16]  Rolf Rabenseifner The controlled logical clock--a global time for trace-based software monitoring of parallel applications in workstation clusters , 1997, PDP.

[17]  Friedemann Mattern,et al.  Virtual Time and Global States of Distributed Systems , 2002 .

[18]  Wolfgang E. Nagel,et al.  VAMPIR: Visualization and Analysis of MPI Resources , 2010 .

[19]  Andrzej Duda,et al.  Estimating Global Time in Distributed Systems , 1987, ICDCS.

[20]  Thomas H. Dunigan,et al.  Hypercube clock synchronization , 1991, Concurr. Pract. Exp..

[21]  O. Spaniol,et al.  Messung, Modellierung und Bewertung von Rechen- und Kommunikationssystemen, 7. ITG/GI-Fachtagung, Aachen, Germany, 21.-23. September 1993 , 1993, MMB.

[22]  Colin J. Fidge,et al.  Timestamps in Message-Passing Systems That Preserve the Partial Ordering , 1988 .

[23]  Andre Heilper,et al.  Clock Synchronization in Cell BE Traces , 2008, Euro-Par.

[24]  Felix Wolf,et al.  Timestamp Synchronization for Event Traces of Large-Scale Message-Passing Applications , 2007, PVM/MPI.

[25]  Özalp Babaoglu,et al.  Low-cost clock synchronization , 1993, Distributed Computing.