Locally-Oriented Programming: A Simple Programming Model for Stencil-Based Computations on Multi-Level Distributed Memory Architectures

Emerging hybrid accelerator architectures for high performance computing are often suited for the use of a data-parallel programming model. Unfortunately, programmers of these architectures face a steep learning curve that frequently requires learning a new language (e.g., OpenCL). Furthermore, the distributed (and frequently multi-level) nature of the memory organization of clusters of these machines provides an additional level of complexity. This paper presents preliminary work examining how programming with a local orientation can be employed to provide simpler access to accelerator architectures. A locally-oriented programming model is especially useful for the solution of algorithms requiring the application of a stencil or convolution kernel. In this programming model, a programmer codes the algorithm by modifying only a single array element (called the local element), but has read-only access to a small sub-array surrounding the local element. We demonstrate how a locally-oriented programming model can be adopted as a language extension using source-to-source program transformations.

[1]  William N. Scherer,et al.  Implementation and Performance Evaluation of the HPC Challenge Benchmarks in Coarray Fortran 2.0 , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[2]  Paul Klint,et al.  Efficient annotated terms , 2000, Softw. Pract. Exp..

[3]  Eelco Visser,et al.  Stratego/XT 0.17. A language and toolset for program transformation , 2008, Sci. Comput. Program..

[4]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[5]  William N. Scherer,et al.  A new vision for coarray Fortran , 2009, PGAS '09.

[6]  Steven J. Deitz,et al.  The High-Level Parallel Language ZPL Improves Productivity and Performance , 2004 .

[7]  Brian van Straalen,et al.  Experiences from Software Engineering of Large Scale AMR Multiphysics Code Frameworks , 2013, ArXiv.

[8]  John B. Bell,et al.  Performance and scaling of locally-structured grid methods forpartial differential equations , 2007 .

[9]  Rajeev Thakur,et al.  Software Abstractions and Methodologies for HPC Simulation Codes on Future Architectures , 2013, ArXiv.

[10]  John Shalf,et al.  Multithreaded global address space communication techniques for Gyrokinetic fusion applications on ultra-scale platforms , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[11]  P. Klint,et al.  Efficient annotated terms , 2000 .

[12]  Sabine Roller,et al.  CAF versus MPI - Applicability of Coarray Fortran to a Flow Solver , 2011, EuroMPI.

[13]  Jeffrey Overbey,et al.  ForOpenCL: transformations exploiting array syntax in Fortran for accelerator programming , 2011, Int. J. Comput. Sci. Eng..

[14]  Anshu Dubey Stencils in Scientific Computations , 2014 .