X10 as a Parallel Language for Scientific Computation: Practice and Experience

X10 is an emerging Partitioned Global Address Space (PGAS) language intended to increase significantly the productivity of developing scalable HPC applications. The language has now matured to a point where it is meaningful to consider writing large scale scientific application codes in X10. This paper reports our experiences writing three codes from the chemistry/material science domain: Fast Multipole Method (FMM), Particle Mesh Ewald (PME) and Hartree-Fock (HF), entirely in X10. Performance results are presented for up to 256 places on a Blue Gene/P system. During the course of this work our experiences have been shared with the X10 development team, so that application requirements could inform language design discussions as the language capabilities influenced algorithm design. This resulted in improvements in the language implementation and standard class libraries, including the design of the array API and support for complex math. Data constructs in X10 such as \emph{places} and \emph{distributed arrays}, and parallel constructs such as \emph{finish} and \emph{async}, simplify implementation of the applications in comparison with MPI. However, current implementation limitations in X10 2.1.2 make it difficult to achieve scalable performance using the most natural expressions of the algorithms. The most serious limitation is the use of point-to-point communication patterns, rather than collectives, to implement parallel constructs and array operations. This issue will be addressed in future releases of X10.

[1]  T. Darden,et al.  A smooth particle mesh Ewald method , 1995 .

[2]  Olivier Coulaud,et al.  High performance BLAS formulation of the adaptive Fast Multipole Method , 2010, Math. Comput. Model..

[3]  L. Greengard,et al.  A new version of the Fast Multipole Method for the Laplace equation in three dimensions , 1997, Acta Numerica.

[4]  David E. Bernholdt,et al.  Programmability of the HPCS Languages: A case study with a quantum chemistry kernel , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[5]  T. Darden,et al.  A Multipole-Based Algorithm for Efficient Calculation of Forces and Potentials in Macroscopic Period , 1996 .

[6]  Gustavo E. Scuseria,et al.  A fast multipole method for periodic systems with arbitrary unit cell geometries , 1998 .

[7]  Hans Peter Lüthi,et al.  A coarse‐grain parallel implementation of the direct SCF method , 1992 .

[8]  Sriram Krishnamoorthy,et al.  Lifeline-based global load balancing , 2011, PPoPP '11.

[9]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[10]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[11]  Martin Head-Gordon,et al.  Rotating around the quartic angular momentum barrier in fast multipole method calculations , 1996 .

[12]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[13]  Al Geist,et al.  IESP Exascale Challenge: Co-Design of Architectures and Algorithms , 2009, Int. J. High Perform. Comput. Appl..

[14]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[15]  Lexing Ying,et al.  A massively parallel adaptive fast-multipole method on heterogeneous architectures , 2009, SC.

[16]  Pierre Fortin,et al.  Algorithmique hiérarchique parallèle haute performance pour les problèmes à N-corps. (High performance parallel hierarchical algorithmic for N-body problems) , 2006 .

[17]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[18]  Peter M. W. Gill,et al.  The prism algorithm for two-electron integrals , 1991 .

[19]  E. Davidson,et al.  One- and two-electron integrals over cartesian gaussian functions , 1978 .

[20]  T. Straatsma,et al.  THE MISSING TERM IN EFFECTIVE PAIR POTENTIALS , 1987 .

[21]  José E. Moreira,et al.  A Volumetric FFT for BlueGene/L , 2003, HiPC.

[22]  Tong Wen Introduction to the X 10 Implementation of NPB MG , 2006 .

[23]  Alex Rapaport,et al.  Mpi-2: extensions to the message-passing interface , 1997 .

[24]  Holger Dachsel,et al.  Fast and accurate determination of the Wigner rotation matrices in the fast multipole method. , 2006, The Journal of chemical physics.

[25]  Lexing Ying,et al.  A New Parallel Kernel-Independent Fast Multipole Method , 2003, ACM/IEEE SC 2003 Conference (SC'03).