Event-Predicate Detection in the Debugging of Distributed Applications

Trends in the development of computer hardware are making the use of distributed systems increasingly attractive. The collection of event-trace data and the construction of process-time diagrams can provide a useful visualization tool. In practical situations, however, these diagrams are too large for users to nd them comprehensible. The ability to detect and locate arbitrary (complex) predicates within an event trace can help to alleviate this problem. This thesis enumerates ve classes of problems that a successful eventdetection strategy should be able to identify: phase transitions, mutualexclusion violations, subroutines, communication symmetry, and performance bottlenecks. Some previous e orts in this area o er an expressivity which is close to that required to meet these goals, but are hampered by an insu cient understanding of the partial order which underlies causality in a distributed-execution trace. This work de nes a partial-order precedence relationship for compound events, and extends two timestamping algorithms to support it. A new syntax for event-predicate de nition, which comes closer to ful lling the aforementioned framework than any of the previous e orts, is presented. Finally, a prototypical implementation, within Taylor's Partial-Order Event Tracer (POET), is described, issues encountered during its construction are discussed, and its performance is evaluated.

[1]  Vijay K. Garg,et al.  Detection of Weak Unstable Predicates in Distributed Programs , 1994, IEEE Trans. Parallel Distributed Syst..

[2]  Mukesh Singhal,et al.  Deadlock detection in distributed systems , 1989, Computer.

[3]  W. Weigel,et al.  Global events and global breakpoints in distributed systems , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume II: Software track.

[4]  Twan Basten,et al.  Event abstraction in modeling distributed computations , 1994 .

[5]  David J. Taylor,et al.  Visualizing PVM Executions , 1995 .

[6]  Jack C. Wileden,et al.  High-level debugging of distributed systems: The behavioral abstraction approach , 1983, J. Syst. Softw..

[7]  Andrew S. Tanenbaum,et al.  Structured Computer Organization , 1976 .

[8]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[9]  Willard Korfhage,et al.  Enhancing Distributed Event Predicate Detection Algorithms , 1996, IEEE Trans. Parallel Distributed Syst..

[10]  Dennis Taylor,et al.  Time and order of abstract events in distributed computations , 1994 .

[11]  Gail E. Kaiser,et al.  Debugging multithreaded programs with MPD , 1991, IEEE Software.

[12]  Willard Korfhage,et al.  Efficient global event predicate detection , 1994, 14th International Conference on Distributed Computing Systems.

[13]  Michael Lesk,et al.  Language development tools , 1986 .

[14]  Gordon V. Cormack An LR substring parser for noncorrecting syntax error recovery , 1989, PLDI '89.

[15]  Thomas Kunz,et al.  Single stepping in event-visualization tools , 1996, CASCON.

[16]  Gail E. Kaiser,et al.  Modeling concurrency in parallel debugging , 1990, PPOPP '90.

[17]  Steve Jobs: the journey is the reward , 1988 .

[18]  Jong-Deok Choi,et al.  Breakpoints and halting in distributed programs , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[19]  Gail E. Kaiser,et al.  Debugging Multi-Threaded Programs with M p , 1991 .

[20]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[21]  James Hardy Wilkinson The pilot ACE , 1989 .

[22]  Thomas Kunz Abstract behaviour of distributed executions with applications to visualization , 1994 .

[23]  Colin J. Fidge,et al.  Timestamps in Message-Passing Systems That Preserve the Partial Ordering , 1988 .

[24]  Ronald H. Perrott,et al.  Parallel programming , 1988, International computer science series.

[25]  Wing Hong Cheung Process and event abstraction for debugging distributed programs , 1989 .

[26]  Thomas Kunz,et al.  Achieving target-system independence in event visualisation , 1995, CASCON.

[27]  John W. Backus,et al.  The history of FORTRAN I, II, and III , 1978, SIGP.

[28]  Fred B. Schneider,et al.  A Theory of Graphs , 1993 .

[29]  Michiel F. H. Seuren,et al.  Design and Implementation of an Automatic Event Abstraction Tool , 1996 .

[30]  Colin J Fidge Dynamic analysis of event orderings in message-passing systems , 1989 .

[31]  Willard Korfhage,et al.  Detecting ENF Event Predicates in Distributed Systems , 1997, J. Parallel Distributed Comput..

[32]  A. K. Datta,et al.  Deadlock detection in distributed systems , 1990, Ninth Annual International Phoenix Conference on Computers and Communications. 1990 Conference Proceedings.

[33]  Colin J. Fidge,et al.  Logical time in distributed computing systems , 1991, Computer.

[34]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[35]  David Taylor A prototype debugger for Hermes , 1992, CASCON.

[36]  Andrew S. Tanenbaum,et al.  Distributed operating systems , 2009, CSUR.

[37]  David J. Taylor The use of process clustering in distributed-system event displays , 1993, CASCON.

[38]  Keith Marzullo,et al.  Maintaining the time in a distributed system , 1985, OPSR.

[39]  David J. Taylor Event Displays for Debugging and Managing Distributed Systems , 1995 .