Trace Recovery in Multi-Processing Systems: Architectural Considerations

Execution monitoring plays a central role in most software development tools for parallel and distributed computer systems. However, such monitoring may induce delays that corrupt trace event timing. Recently we demonstrated that, given a safe timed Petri net model of monitored software, timestamp values that would have been observed had the delays not been present can sometimes be recovered from the corrupted trace. In this paper we discuss the architectural implications of applying this result to multi-processing systems.

[1]  Thomas L. Casavant,et al.  Using perturbation tracking to compensate for intrusion in message-passing systems , 1994, 14th International Conference on Distributed Computing Systems.

[2]  Allen D. Malony,et al.  Performance Measurement Intrusion and Perturbation Analysis , 1992, IEEE Trans. Parallel Distributed Syst..

[3]  Raghu N. Kacker,et al.  Time-Perturbation Tuning of MIMD Programs , 1992 .

[4]  Thomas L. Casavant,et al.  Design of a system for software testing and debugging for multiprocessor avionics systems , 1991, [1991] Proceedings The Fifteenth Annual International Computer Software & Applications Conference.

[5]  Charles E. McDowell,et al.  Determining Possible Event Orders by Analyzing Sequential Traces , 1993, IEEE Trans. Parallel Distributed Syst..

[6]  Thomas L. Casavant,et al.  Perturbation tracking , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[7]  Theodore F. Hehr Compensating for perturbation by software performance monitors in asynchronous computations , 1990 .

[8]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[9]  Thomas L. Casavant,et al.  Conditions for Tracking Timing Perturbations in Timed Petri Nets with Monitors , 1993 .

[10]  Raghu Kacker,et al.  Using Synthetic-Perturbation Techniques for Tuning Shared Memory Programs , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[11]  Jr. James Edward Lumpp Models for recovery from software instrumentation intrusion in parallel and distributed systems , 1993 .

[12]  Barton P. Miller,et al.  What are race conditions?: Some issues and formalizations , 1992, LOPL.

[13]  Christos G. Cassandras,et al.  Infinitesimal and finite perturbation analysis for queueing networks , 1982, 1982 21st IEEE Conference on Decision and Control.

[14]  M. Spezialetti,et al.  A general methodology for the system state characterization of event recognitions , 1990, Proceedings Ninth Symposium on Reliable Distributed Systems.