Extending C++ with co-array semantics

The current trend of large scientific computing problems is to align as much as possible to a Single Programming Multiple Data (or SPMD) scheme when the application algorithms are conducive to parallelization and vectorization. This reduces the complexity of code because the processors or (computational nodes) perform the same instructions which allows for better performance as algorithms work on local data sets instead of continuously transferring data from one locality to another. However, certain applications, such as stencil problems, demonstrate the need to move data to or from remote localities. This involves an additional degree of complexity, as one must know with which localities to exchange data. In order to solve this issue, Fortran has extended its scalar element indexing approach to distributed structures of elements. In this extension, a structure of scalar elements is attributed a ”co-index” and lives in a specific locality. A co-index provides the application with enough information to retrieve the corresponding data reference. In C++, containers present themselves as a ”smarter” alternative of Fortran arrays but there are still no corresponding standardized features similar to the Fortran co-indexing approach. In this paper, we present an implementation of such features in HPX, a general purpose C++ runtime system for applications of any scale. We describe how the combination of the HPX features and the actual C++ Standard makes it easy to define a high performance API similar to Co-Array Fortran.

[1]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[2]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[3]  Dietmar Fey,et al.  Higher-level parallelization for local and distributed asynchronous task-based programming , 2015, ESPM '15.

[4]  Robert W. Numrich,et al.  Co-arrays in the next Fortran Standard , 2005, FORF.

[5]  Thomas L. Sterling,et al.  ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications , 2009, 2009 International Conference on Parallel Processing Workshops.

[6]  José E. Moreira,et al.  A C++ implementation of the co-array programming model for blue gene/L , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[7]  Wim Vanroose,et al.  Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines , 2013, SIAM J. Sci. Comput..

[8]  Katherine A. Yelick,et al.  Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..

[9]  Hartmut Kaiser,et al.  HPX: A Task Based Programming Model in a Global Address Space , 2014, PGAS.

[10]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[11]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[12]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.