A Comparison of Three Programming Models for Adaptive Applications on the Origin2000

Adaptive applications have computational workloads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications is therefore a challenging task. In this paper, we compare the performance of and the programming effort required for two major classes of adaptive applications under three leading parallel programming models on an SGI Origin2000 system, a machine which supports all three models efficiently. Results indicate that the three models deliver comparable performance; however, the implementations differ significantly beyond merely using explicit messages versus implicit loads/stores even though the basic parallel algorithms are similar. Compared with the message-passing (using MPI) and SHMEM programming models, the cache-coherent shared address space (CC-SAS) model provides substantial ease of programming at both the conceptual and program orchestration levels, often accompanied by performance gains. However, CC-SAS currently has portability limitations and may suffer from poor spatial locality of physically distributed shared data on large numbers of processors.

[1]  Rupak Biswas,et al.  Parallel Load Balancing for Adaptive Unstructured Meshes , 1998 .

[2]  Leonid Oliker,et al.  Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems , 2000, IPDPS Workshops.

[3]  Vipin Kumar,et al.  Parallel Multilevel series k-Way Partitioning Scheme for Irregular Graphs , 1999, SIAM Rev..

[4]  Lawrence Snyder,et al.  On the influence of programming models on shared memory computer performance , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[5]  Hongzhang Shan,et al.  Parallel Sorting on Cache-coherent DSM Multiprocessors , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[6]  Vipin Kumar,et al.  Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[7]  Jaswinder Pal Singh,et al.  A comparison of MPI, SHMEM and cache-coherent shared address space programming models on the SGI Origin2000 , 1999, ICS '99.

[8]  Pangfeng Liu,et al.  Experiences with Parallel N-Body Simulation , 2000, IEEE Trans. Parallel Distributed Syst..

[9]  Leonid Oliker,et al.  Parallel tetrahedral mesh adaptation with dynamic load balancing , 2013, Parallel Comput..

[10]  L. Hernquist Hierarchical N-body methods , 1987 .

[11]  Roy D. Williams,et al.  Performance of dynamic load balancing algorithms for unstructured mesh calculations , 1991, Concurr. Pract. Exp..

[12]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[13]  Marc Levoy,et al.  Parallel visualization algorithms: performance and architectural implications , 1994, Computer.

[14]  Rupak Biswas,et al.  A new procedure for dynamic adaption of three-dimensional unstructured grids , 1993 .

[15]  GuptaAnoop,et al.  Parallel Visualization Algorithms , 1994 .

[16]  John K. Salmon,et al.  Parallel hierarchical N-body methods , 1992 .

[17]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[18]  George Karypis,et al.  Parmetis parallel graph partitioning and sparse matrix ordering library , 1997 .

[19]  Margaret Martonosi,et al.  Tradeoffs in Message Passing and Shared Memory Implementations of a Standard Cell Router , 1989, ICPP.

[20]  Marios D. Dikaiakos,et al.  A performance study of cosmological simulations on message-passing and shared-memory multiprocessors , 1996, ICS '96.

[21]  Richard J. Anderson,et al.  A comparison of shared and nonshared memory models of parallel computation , 1991 .

[22]  R. Biswas,et al.  A new procedure for dynamic adaption of three-dimensional unstructured grids , 1994 .

[23]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[24]  L. Oliker,et al.  Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[25]  Leonid Oliker,et al.  Parallelization of a Dynamic Unstructured Algorithm Using Three Leading Programming Paradigms , 2000, IEEE Trans. Parallel Distributed Syst..

[26]  Leonid Oliker,et al.  PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes , 1998, J. Parallel Distributed Comput..

[27]  D WilliamsRoy Performance of dynamic load balancing algorithms for unstructured mesh calculations , 1991 .