Support for adaptivity in ARMCI using migratable objects

Many new paradigms of parallel programming have emerged that compete with and complement the standard and well-established MPI model. Most notable, and successful, among these are models that support some form of global address space. At the same time, approaches based on migratable objects (also called virtualized processes) have shown that resource management concerns can be separated effectively from the overall parallel programming effort. For example, Charm++ supports dynamic load balancing via an intelligent adaptive run-time system. It is also becoming clear that a multi-paradigm approach that allows modules written in one or more paradigms to coexist and co-operate will be necessary to tame the parallel programming challenge. ARMCI is a remote memory copy library that serves as a foundation of many global address space languages and libraries. This paper presents our preliminary work on integrating and supporting ARMCI with the adaptive run-time system of Charm++ as a part of our overall effort in the multi-paradigm approach

[1]  Laxmikant V. Kalé,et al.  MSA: Multiphase Specifically Shared Arrays , 2004, LCPC.

[2]  Chao Huang SYSTEM SUPPORT FOR CHECKPOINT AND RESTART OF CHARM++ AND AMPI APPLICATIONS , 2004 .

[3]  Robert J. Harrison,et al.  Global arrays: A nonuniform memory access programming model for high-performance computers , 1996, The Journal of Supercomputing.

[4]  Gengbin Zheng,et al.  Achieving High Performance on Extremely Large Parallel Machines: Performance Prediction and Load Balancing , 2005 .

[5]  Laxmikant V. Kalé,et al.  Scalable Cosmological Simulations on Parallel Machines , 2006, VECPAR.

[6]  Sameer Kumar,et al.  Scalable fine‐grained parallelization of plane‐wave–based ab initio molecular dynamics for large supercomputers , 2004, J. Comput. Chem..

[7]  Tarek A. El-Ghazawi,et al.  UPC Performance and Potential: A NPB Experimental Study , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[8]  Lawrence Snyder,et al.  ZPL: An Array Sublanguage , 1993, LCPC.

[9]  Laxmikant V. Kalé,et al.  Converse: an interoperable framework for parallel programming , 1996, Proceedings of International Conference on Parallel Processing.

[10]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[11]  Guy L. Steele,et al.  The High Performance Fortran Handbook , 1993 .

[12]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[13]  Laxmikant V. Kalé,et al.  NAMD: Biomolecular Simulation on Thousands of Processors , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[14]  Laxmikant V. Kale,et al.  Optimizing Communication for Massively Parallel Processing , 2005 .

[15]  Nicholas Carriero,et al.  Parallel Programming in Linda , 1985, ICPP.

[16]  Laxmikant V. Kale,et al.  Proactive Fault Tolerance in Large Systems , 2004 .

[17]  John M. Mellor-Crummey,et al.  A Multi-Platform Co-Array Fortran Compiler , 2004, IEEE PACT.

[18]  Gabriel Antoniu,et al.  An Efficient and Transparent Thread Migration Scheme in the PM2 Runtime System , 1999, IPPS/SPDP Workshops.

[19]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[20]  Laxmikant V. Kalé,et al.  FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[21]  Bryan Carpenter,et al.  ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems , 1999, IPPS/SPDP Workshops.

[22]  Laxmikant V. Kalé,et al.  Scaling Molecular Dynamics to 3000 Processors with Projections: A Performance Analysis Case Study , 2003, International Conference on Computational Science.

[23]  Alex Rapaport,et al.  Mpi-2: extensions to the message-passing interface , 1997 .

[24]  Laxmikant V. Kalé,et al.  Performance evaluation of adaptive MPI , 2006, PPoPP '06.