A parallel and distributed debugger implemented with Java

In an ongoing project at NHPCC (National High Performance Computing Center) at Hefei of China, we are building a debugger for parallel/distributed programs that run on a cluster of homogeneous workstations-Dawning cluster system. Such debuggers are commonly built by layering a sophisticated user interface on top of existing sequential debuggers, such as dbx or gdb. We employ object-oriented technology in our work by using Java as the programming language. This paper describes how we design and implement the debugger DCDB, the problems encountered and their solutions.

[1]  Jamie Jaworski,et al.  JAVA developer's guide , 1996 .

[2]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[3]  André Schiper,et al.  Efficient Execution Replay Technique for Distributed Memory Architectures , 1991, EDMCC.

[4]  Robert Hood,et al.  A portable debugger for parallel and distributed programs , 1994, Proceedings of Supercomputing '94.

[5]  D. Manivannan,et al.  Finding Consistent Global Checkpoints in a Distributed Computation , 1997, IEEE Trans. Parallel Distributed Syst..