Global Trees: A framework for linked data structures on distributed memory parallel systems

This paper describes the Global Trees (GT) system that provides a multi-layered interface to a global address space view of distributed tree data structures, while providing scalable performance on distributed memory systems. The Global Trees system utilizes coarse-grained data movement to enhance locality and communication efficiency. We describe the design and implementation of GT, illustrate its use in the context of a gravitational simulation application, and provide experimental results that demonstrate the effectiveness of the approach. The key benefits of using this system include efficient shared-memory style programming of distributed trees, tree-specific optimizations for data access and computation, and the ability to customize many aspects of GT to optimize application performance.

[1]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[2]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[3]  Nicholas Carriero,et al.  The S/Net's Linda kernel , 1986, TOCS.

[4]  Kourosh Gharachorloo,et al.  Design and performance of the Shasta distributed shared memory protocol , 1997, ICS '97.

[5]  Aman Singla Umakishore The Beehive Cluster System , 1997 .

[6]  Bradford L. Chamberlain,et al.  The cascade high productivity language , 2004, Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004. Proceedings..

[7]  Alex Rapaport,et al.  Mpi-2: extensions to the message-passing interface , 1997 .

[8]  Liviu Iftode,et al.  Shared virtual memory with automatic update support , 1999, ICS '99.

[9]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[10]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[11]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[12]  Kirk L. Johnson,et al.  CRL: high-performance all-software distributed shared memory , 1995, SOSP.

[13]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[14]  Brian N. Bershad,et al.  The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.

[15]  Robert J. Harrison,et al.  Global arrays: A nonuniform memory access programming model for high-performance computers , 1996, The Journal of Supercomputing.

[16]  Laura Ricci,et al.  Solving Irregular Problems through Parallel Irregular Trees , 2005, Parallel and Distributed Computing and Networks.

[17]  Henri E. Bal,et al.  Experience with distributed programming in Orca , 1990, Proceedings. 1990 International Conference on Computer Languages.

[18]  James R. Larus,et al.  Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.

[19]  Anne Rogers,et al.  Supporting dynamic data structures on distributed-memory machines , 1995, TOPL.

[20]  Henri E. Bal,et al.  Orca: a language for distributed programming , 1990, SIGP.

[21]  Alan L. Cox,et al.  Lazy release consistency for software distributed shared memory , 1992, ISCA '92.

[22]  Andrew P. Black,et al.  Fine-grained mobility in the Emerald system , 1987, TOCS.

[23]  Srinivasan Parthasarathy,et al.  Memory Placement Techniques for Parallel Association Mining , 1998, KDD.

[24]  Srinivasan Parthasarathy,et al.  Adaptive Parallel Graph Mining for CMP Architectures , 2006, Sixth International Conference on Data Mining (ICDM'06).

[25]  Dan Bonachea GASNet Specification, v1.1 , 2002 .

[26]  Brian N. Bershad,et al.  Practical considerations for non-blocking concurrent objects , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[27]  Robert J. Harrison,et al.  Multiresolution Quantum Chemistry in Multiwavelet Bases , 2003, International Conference on Computational Science.

[28]  Nicholas Carriero,et al.  Linda and Friends , 1986, Computer.

[29]  Bryan Carpenter,et al.  ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems , 1999, IPPS/SPDP Workshops.

[30]  Brian N. Bershad,et al.  Midway : shared memory parallel programming with entry consistency for distributed memory multiprocessors , 1991 .

[31]  Rishiyur S. Nikhil,et al.  Cid: A Parallel, "Shared-Memory" C for Distributed-Memory Machines , 1994, LCPC.

[32]  Sriram Krishnamoorthy,et al.  Scioto: A Framework for Global-View Task Parallelism , 2008, 2008 37th International Conference on Parallel Processing.

[33]  Maged M. Michael,et al.  The Implementation of Cashmere , 1996 .

[34]  P. Keleher,et al.  Lazy release consistency for distributed shared memory , 1996 .

[35]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[36]  J. CARRIERt,et al.  A FAST ADAPTIVE MULTIPOLE ALGORITHM FOR PARTICLE SIMULATIONS * , 2022 .

[37]  Laura Ricci,et al.  PIT: A Library for the Parallelization of Irregular Problems , 2002, PARA.

[38]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[39]  Guy L. Steele,et al.  Parallel Programming and Parallel Abstractions in Fortress , 2005, IEEE PACT.

[40]  Katherine A. Yelick,et al.  Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..

[41]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.