PDB: pervasive debugging with Xen

Building distributed grid applications is notoriously difficult: the complex interactions between concurrently running processes, middleware, operating systems, underlying devices, and interconnecting networks can lead to unpredictable and difficult to analyze errors. Yet debugging support for such systems is woefully inadequate; typically a central user interface coordinates a set of conventional debuggers. This structure leads to synchronization problems and is limited to debugging user-mode applications. In this paper we present the design and implementation of PDB, a pervasive debugger which executes in a virtualization layer underneath the entire distributed system. By running each node of a distributed application in a separate virtual environment atop the debugger, PDB can exercise full control over the entire execution environment.

[1]  Robert Hood,et al.  A portable debugger for parallel and distributed programs , 1994, Proceedings of Supercomputing '94.

[2]  Willy Zwaenepoel,et al.  Causal distributed breakpoints , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[3]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[4]  Bob Boothe Efficient algorithms for bidirectional debugging , 2000, PLDI '00.

[5]  Marianne Shaw,et al.  Denali: Lightweight Virtual Machines for Distributed and Networked Applications , 2001 .

[6]  Mark A. Linton,et al.  The Evolution of Dbx , 1990, USENIX Summer.

[7]  Nicholas Nethercote,et al.  Valgrind: A Program Supervision Framework , 2003, RV@CAV.

[8]  Greg Schaffer,et al.  Efficient debugging primitives for multiprocessors , 1989, ASPLOS III.

[9]  Timothy L. Harris Dependable software needs pervasive debugging , 2002, EW 10.

[10]  Francine Berman,et al.  Panorama: a portable, extensible parallel debugger , 1993, PADD '93.

[11]  Robert Hood,et al.  A debugger for computational grid applications , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[12]  Sumit Ghosh,et al.  A dynamic debugger for asynchronous distributed algorithms , 1994, IEEE Software.

[13]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[14]  Eric A. Brewer,et al.  PROTEUS: a high-performance parallel-architecture simulator , 1992, SIGMETRICS '92/PERFORMANCE '92.

[15]  Don Allen,et al.  A scalable debugger for massively parallel message-passing programs , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.

[16]  Rajkumar Buyya,et al.  Proceedings of the First IEEE/ACM International Workshop on Grid Computing , 2000 .

[17]  Hector Garcia-Molina,et al.  Debugging a Distributed Computing System , 1984, IEEE Transactions on Software Engineering.