Global trees: a framework for linked data structures on distributed memory parallel systems

This paper describes the Global Trees (GT) system that provides a multi-layered interface to a global address space view of distributed tree data structures, while providing scalable performance on distributed memory systems. The Global Trees system utilizes coarse-grained data movement to enhance locality and communication efficiency. We describe the design and implementation of GT, illustrate its use in the context of a gravitational simulation application, and provide experimental results that demonstrate the effectiveness of the approach. The key benefits of using this system include efficient shared-memory style programming of distributed trees, tree-specific optimizations for data access and computation, and the ability to customize many aspects of GT to optimize application performance.

[1]  Guy L. Steele Parallel Programming and Parallel Abstractions in Fortress , 2005, IEEE PACT.

[2]  Nicholas Carriero,et al.  Linda and Friends , 1986, Computer.

[3]  Robert J. Harrison,et al.  Global arrays: A nonuniform memory access programming model for high-performance computers , 1996, The Journal of Supercomputing.

[4]  Andrew P. Black,et al.  Fine-grained mobility in the Emerald system , 1987, TOCS.

[5]  Laura Ricci,et al.  Solving Irregular Problems through Parallel Irregular Trees , 2005, Parallel and Distributed Computing and Networks.

[6]  Henri E. Bal,et al.  Experience with distributed programming in Orca , 1990, Proceedings. 1990 International Conference on Computer Languages.

[7]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[8]  Sriram Krishnamoorthy,et al.  Scioto: A Framework for Global-View Task Parallelism , 2008, 2008 37th International Conference on Parallel Processing.

[9]  Maged M. Michael,et al.  The Implementation of Cashmere , 1996 .

[10]  James R. Larus,et al.  Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.

[11]  Rishiyur S. Nikhil,et al.  Cid: A Parallel, "Shared-Memory" C for Distributed-Memory Machines , 1994, LCPC.

[12]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[13]  Srinivasan Parthasarathy,et al.  Memory Placement Techniques for Parallel Association Mining , 1998, KDD.

[14]  Katherine Yelick,et al.  Titanium: a high-performance Java dialect , 1998 .

[15]  Nicholas Carriero,et al.  The S/Net's Linda kernel , 1986, TOCS.

[16]  P. Keleher,et al.  Lazy release consistency for distributed shared memory , 1996 .

[17]  Robert J. Harrison,et al.  Multiresolution Quantum Chemistry in Multiwavelet Bases , 2003, International Conference on Computational Science.

[18]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[19]  Kourosh Gharachorloo,et al.  Design and performance of the Shasta distributed shared memory protocol , 1997, ICS '97.

[20]  Liviu Iftode,et al.  Shared virtual memory with automatic update support , 1999, ICS '99.

[21]  Katherine Yelick,et al.  UPC Language Specifications V1.1.1 , 2003 .

[22]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[23]  Hans P. Zima,et al.  The cascade high productivity language , 2004 .

[24]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[25]  Kirk L. Johnson,et al.  CRL: high-performance all-software distributed shared memory , 1995, SOSP.

[26]  L. Greengard,et al.  A Fast Adaptive Multipole Algorithm for Particle Simulations , 1988 .

[27]  Dan Bonachea GASNet Specification, v1.1 , 2002 .

[28]  Brian N. Bershad,et al.  Practical considerations for non-blocking concurrent objects , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[29]  Gregory V. Wilson,et al.  Parallel Programming Using C , 1996 .

[30]  Anne Rogers,et al.  Supporting dynamic data structures on distributed-memory machines , 1995, TOPL.

[31]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[32]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[33]  Brian N. Bershad,et al.  The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.

[34]  Henri E. Bal,et al.  Orca: a language for distributed programming , 1990, SIGP.

[35]  Alan L. Cox,et al.  Lazy release consistency for software distributed shared memory , 1992, ISCA '92.

[36]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[37]  Brian N. Bershad,et al.  Midway : shared memory parallel programming with entry consistency for distributed memory multiprocessors , 1991 .

[38]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[39]  Guy L. Steele,et al.  Parallel Programming and Parallel Abstractions in Fortress , 2005, IEEE PACT.

[40]  Bryan Carpenter,et al.  ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems , 1999, IPPS/SPDP Workshops.

[41]  Laura Ricci,et al.  PIT: A Library for the Parallelization of Irregular Problems , 2002, PARA.

[42]  Srinivasan Parthasarathy,et al.  Adaptive Parallel Graph Mining for CMP Architectures , 2006, Sixth International Conference on Data Mining (ICDM'06).