Accurate offline synchronization of distributed traces using kernel-level events

Tracing has proven to be a valuable tool for identifying functional and performance problems. In order to use it on distributed nodes, the timestamps in the traces need to be precisely synchronized. The objective of this work is to improve synchronization of traces recorded on distributed nodes. We aim for high precision and low intrusiveness. In this paper, we present an offline trace synchronization algorithm that is directly applicable to pairs of nodes and that can report approximate bounds on accuracy over short tracing durations. We also present an efficient implementation of this algorithm and an experimental study of parameters that affect synchronization accuracy.

[1]  Martin Mauve,et al.  On the Time Synchronization of Distributed Log Files in Networks With Local Broadcast Media , 2009, IEEE/ACM Transactions on Networking.

[2]  Ursula Hilgers,et al.  Theory and tool for estimating global time in parallel and distributed systems , 1998, Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing - PDP '98 -.

[3]  Björn Scheuermann,et al.  Who said that?: the send-receive correlation problem in network log analysis , 2009, PERV.

[4]  Renaud Sirdey,et al.  A linear programming approach to highly precise clock synchronization over a packet network , 2008, 4OR.

[5]  David L. Mills,et al.  Computer network time synchronization : the network time protocol on earth and in space , 2006 .

[6]  David L. Mills,et al.  Precision synchronization of computer network clocks , 1994, CCRV.

[7]  Steven McCanne,et al.  The BSD Packet Filter: A New Architecture for User-level Packet Capture , 1993, USENIX Winter.

[8]  Jean-Marc Jézéquel,et al.  Building a global clock for observing computations in distributed memory parallel computers , 1996, Concurr. Pract. Exp..

[9]  M. Desnoyers Low-Impact Operating System Tracing , 2009 .

[10]  R. J. Kulpinski,et al.  Dissemination of System Time , 1973, IEEE Trans. Commun..

[11]  Mathieu Desnoyers,et al.  Linux Kernel Debugging on Google-sized clusters , 2007 .

[12]  Felix Wolf,et al.  Scalable timestamp synchronization for event traces of message-passing applications , 2009, Parallel Comput..

[13]  Andrzej Duda,et al.  Estimating Global Time in Distributed Systems , 1987, ICDCS.

[14]  Yoram Haddad Performances dans les systèmes répartis : des outils pour les mesures , 1988 .

[15]  Eiji Oki,et al.  GLPK (GNU Linear Programming Kit) , 2012 .

[16]  P. Ashton Algorithms For Off-line Clock Synchronization , 1995 .

[17]  Matthias S. Müller,et al.  Internal Timer Synchronization for Parallel Event Tracing , 2008, PVM/MPI.

[18]  Michel Dagenais,et al.  Traces Synchronization in Distributed Networks , 2009, J. Comput. Networks Commun..

[19]  Jack J. Dongarra,et al.  Review of Performance Analysis Tools for MPI Parallel Programs , 2001, PVM/MPI.

[20]  Manish Jain,et al.  Effects of Interrupt Coalescence on Network Measurements , 2004, PAM.