Debugging of heterogeneous parallel systems

The Agora system supports the development of heterogeneous parallel programs, e.g. programs written in multiple languages and running on heterogeneous machines. Agora has been used since September 1986 in a large distributed system [1]: Two versions of the application have been demonstrated in one year, contrary to the expectation of two years per one version. The simplicity in debugging is one of the reasons of the productivity speedup gained. This simplicity is due both to the deeper understanding that the debugger has of parallel systems, and to a novel feature: the ability to replay the execution of parallel systems built with Agora. A user is able to exactly repeat for any number of times and at a slower pace an execution that failed. This makes it easy to identify time-dependent errors, which are peculiar to parallel and distributed systems. The debugger can also be customized to support user defined synchronization primitives, which are built on top of the system provided ones. The Agora debugger tackles three set of problems that no parallel debugger in the past has simultaneously addressed: dealing with programming-in-the-large, multiple processes in different languages, and multiple machine architectures.

[1]  LamportLeslie Time, clocks, and the ordering of events in a distributed system , 1978 .

[2]  David L. Black,et al.  Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures , 1987, IEEE Trans. Computers.

[3]  S. Y. Chiu DEBUGGING DISTRIBUTED COMPUTATIONS IN A NESTED ATOMIC ACTION SYSTEM , 1984 .

[4]  R. D. Schiffenbauer INTERACTIVE DEBUGGING IN A DISTRIBUTED COMPUTATIONAL , 1981 .

[5]  Bernd Bruegge,et al.  Adaptability and portability of symbolic debuggers , 1985 .

[6]  Edward Tucker Smith Debugging techniques for communicating, loosely-coupled processes , 1982 .

[7]  Larry Masinter,et al.  The Interlisp Programming Environment , 1981, Computer.

[8]  David A. Moon,et al.  The Symbolics Genera Programming Environment , 1987, IEEE Software.

[9]  Alessandro Forin,et al.  Multilanguage Parallel Programming of Heterogeneous Machines , 1988, IEEE Trans. Computers.

[10]  R. Lathe Phd by thesis , 1988, Nature.

[11]  Jack C. Wileden,et al.  High-level debugging of distributed systems: The behavioral abstraction approach , 1983, J. Syst. Softw..

[12]  Alessandro Forin,et al.  Parallel processing with Agora , 1987 .

[13]  David L. Black,et al.  Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures , 1987, ASPLOS 1987.

[14]  Craig Schaffert,et al.  CLU Reference Manual , 1984, Lecture Notes in Computer Science.

[15]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.