Interlock avoidance in transparent and dynamic parallel program instrumentation using logical clocks

A fundamental problem with run-time monitoring of parallel programs is the intrusion introduced by instrumenting the original program. In order to minimize the amount of intrusion in monitoring parallel programs, the logical clock approach (LCA) was proposed. It uses logical clocks to time and control the ordering of communication events during monitoring, and to reflect the real execution behavior when running without monitoring. However, the main problem with LCA is that in the case of non-deterministic communication and when several processes wait on each other's logical clock to advance, an interlock situation may occur, where none of the processes can continue to execute. This paper presents a strategy to avoid the interlock situations, based on the concept of ready condition. How the logical clocks are updated and communications are controlled in order to maintain the ordering of events, using a relaxed communication model, will be described. Compared with the original LCA, the new interlock avoidance approach is simpler and introduces less overhead. In addition, the modified logical clock mechanisms introduced in this paper are more general and applicable to a wider range of parallel computing systems.

[1]  Jong-Deok Choi,et al.  Deterministic replay of Java multithreaded applications , 1998, SPDT '98.

[2]  Colin J. Fidge Fundamentals of Distributed System Observation , 1996, IEEE Softw..

[3]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[4]  Wentong Cai Parallel program monitoring : the logical clock approach and its deadlock avoidance , 1990 .

[5]  Kang G. Shin,et al.  Fault-Tolerant Clock Synchronization in Large Multicomputer Systems , 1994, IEEE Trans. Parallel Distributed Syst..

[6]  R. M. Fujimoto,et al.  Parallel discrete event simulation , 1989, WSC '89.

[7]  Wentong Cai,et al.  Experimental studies of conservative distributed discrete-event simulation on transputer networks , 1991 .

[8]  Stephen John Turner,et al.  An Approach to the Run-Time Monitoring of Parallel Programs , 1994, Comput. J..

[9]  Rajiv Gupta,et al.  Dynamic techniques for minimizing the intrusive effect of monitoring actions , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[10]  Rajiv Gupta,et al.  On-line avoidance of the intrusive effects of monitoring on runtime scheduling decisions , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[11]  Allen D. Malony,et al.  Performance Measurement Intrusion and Perturbation Analysis , 1992, IEEE Trans. Parallel Distributed Syst..

[12]  Robert H. B. Netzer,et al.  Debugging race conditions in message-passing programs , 1996, SPDT '96.

[13]  Brian J. N. Wylie,et al.  Annai Scalable Run-Time Support for Interactive Debugging and Performance Analysis of Large-Scale Parallel Programs , 1996, Euro-Par, Vol. I.

[14]  Charles E. Leiserson,et al.  Efficient detection of determinacy races in Cilk programs , 1997, SPAA '97.

[15]  Chinya V. Ravishankar,et al.  Monitoring and debugging distributed realtime programs , 1992, Softw. Pract. Exp..

[16]  Thomas L. Casavant,et al.  Using perturbation tracking to compensate for intrusion in message-passing systems , 1994, 14th International Conference on Distributed Computing Systems.

[17]  Michel Raynal,et al.  An introduction to the analysis and debug of distributed computations , 1995, Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing.

[18]  Charles E. McDowell,et al.  Analyzing Traces with Anonymous Synchronization , 1989, ICPP.

[19]  Barton P. Miller,et al.  Optimal tracing and replay for debugging message-passing parallel programs , 1992, Supercomputing '92.

[20]  Kang Zhang,et al.  Instrumenting Parallel Programs Based on a Virtual Clock Approach , 1997, PDPTA.

[21]  Jeffrey J. P. Tsai,et al.  A noninvasive architecture to monitor real-time distributed systems , 1990, Computer.

[22]  James C. Browne,et al.  Visual programming and debugging for parallel computing , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[23]  Allen D. Malony,et al.  Traceview: a trace visualization tool , 1991, IEEE Software.

[24]  Rajiv Gupta,et al.  Experimental evaluation of on-line techniques for removing monitoring intrusion , 1998, SPDT '98.

[25]  Colin J. Fidge,et al.  Logical time in distributed computing systems , 1991, Computer.

[26]  Allen D. Malony,et al.  Perturbation analysis of high level instrumentation for SPMD programs , 1993, PPOPP '93.

[27]  Allen D. Malony,et al.  Trace View: A Trace Visualization Tool , 1991, ACPC.

[28]  Barton P. Miller,et al.  Dynamic program instrumentation for scalable performance tools , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.