Supporting Relative Debugging for Large-scale UPC Programs

Relative debugging is a useful technique for locating errors that emerge from porting existing code to new programming language or to new computing platform. Recent attention on the UPC programming language has resulted in a number of conventional parallel programs, for example MPI programs, being ported to UPC. This paper gives an overview on the data distribution concepts used in UPC and establishes the challenges in supporting relative debugging technique for UPC programs that run on large supercomputers. The proposed solution is implemented on an existing parallel relative debugger CCDB, and the performance is evaluated on a Cray XE6 system with 16,348 cores.

[1]  David Abramson,et al.  Scalable Relative Debugging , 2014, IEEE Transactions on Parallel and Distributed Systems.

[2]  David Abramson,et al.  Parallel Relative Debugging with Dynamic Data Structures , 2003, PDCS.

[3]  David Abramson,et al.  Relative debugging: a new methodology for debugging scientific applications , 1996, CACM.

[4]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[5]  David Abramson,et al.  Implementation techniques for a parallel relative debugger , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[6]  Rok Sosic,et al.  Relative Debugging Using Multiple Program Versions , 1995 .

[7]  B.P. Miller,et al.  MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[8]  Katherine Yelick,et al.  Introduction to UPC and Language Specification , 2000 .

[9]  Tarek A. El-Ghazawi,et al.  An evaluation of global address space languages: co-array fortran and unified parallel C , 2005, PPoPP.

[10]  Katherine A. Yelick,et al.  Multi-threading and one-sided communication in parallel LU factorization , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[11]  David Abramson,et al.  Data centric highly parallel debugging , 2010, HPDC '10.

[12]  R. Doallo,et al.  UPC performance evaluation on a multicore system , 2009, PGAS '09.

[13]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[14]  Katherine Yelick,et al.  Titanium: a high-performance Java dialect , 1998 .

[15]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[16]  David Abramson,et al.  Relative Debugging for Parallel Systems , 2007 .