Event-Based Techniques to Debug an Object Request Broker

This work presents a debugging system built for the Object Request Broker (ORB) used in the construction of Solaris MC, a multicomputer OS. Even though it has been built and tested on a particular ORB, we believe similar ideas could be employed on other ORBs with similar structure and goals. The goal of this system is to provide a means to stress the ORB behavior in a controlled manner while logging the events occurred during its execution. The tool, called the Fault Injection and Event Logging Tool (FIELT) helps system programmers to find possible inconsistencies in the code by means of a post-mortem analysis of the collected trace data. The approach taken to design the event logging follows the event-driven techniques to monitorize distributed systems. Failures in the ORB are injected by software instrumentation and these injected failures are considered as special events. This allows us to reason about the correctness of the ORB in a broad sense, where its expected behavior includes to gracefully cope with failures. The number of potentially relevant events produced during the ORB execution is unmanageably high. There is, thus, a need to find a minimum subset of those events which, without losing relevant system behavior, allows us to infer its correctness (or lack of). We address this problem using a new model for ORB computations, assigning each event produced by the ORB to one of the high level objects it manages.

[1]  Robert H. B. Netzer,et al.  Optimal tracing and incremental reexecution for debugging long-running programs , 1994, PLDI '94.

[2]  S. Venkatesan,et al.  Testing and Debugging Distributed Programs Using Global Predicates , 1995, IEEE Trans. Software Eng..

[3]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[4]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[5]  Michel Raynal,et al.  On-the-fly replay: a practical paradigm and its implementation for distributed debugging , 1994, Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing.

[6]  Peter C. Bates Debugging Heterogeneous Distributed Systems Using Event-Based Models of Behavior , 1995, ACM Trans. Comput. Syst..

[7]  José M. Bernabéu-Aubán,et al.  Extending a Traditional OS Using Object-Oriented Techniques , 1996, COOTS.

[8]  Peter Dauphin,et al.  HASSE: a Tool for Analyzing Causal Relationships in Parallel and Distributed Systems , 1995, MMB.

[9]  David W. Stemple,et al.  The Ariadne debugger: scalable application of event-based abstraction , 1993, PADD '93.

[10]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[11]  Ozalp Babaoglu,et al.  Consistent global states of distributed systems: fundamental concepts and mechanisms , 1993 .

[12]  Willy Zwaenepoel,et al.  Causal distributed breakpoints , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[13]  Hector Garcia-Molina,et al.  Debugging a Distributed Computing System , 1984, IEEE Transactions on Software Engineering.

[14]  Friedemann Mattern,et al.  Virtual Time and Global States of Distributed Systems , 2002 .