The AllScale Runtime Application Model

Contemporary state-of-the-art runtime systems underlying widely utilized general purpose parallel programming languages and libraries like OpenMP, MPI, or OpenCL provide the foundation for accessing the parallel capabilities of modern computing architectures. In the tradition of their respective imperative host languages those runtime systems’ main focus is on providing means for the distribution and synchronization of operations — while the organization and management of manipulated data is left to application developers. Consequently, the distribution of data remains inaccessible to those runtime systems. However, many desirable system-level features depend on a runtime system’s ability to exercise control on the distribution of data. Thus, program models underlying traditional systems lack the potential for the support of those features. In this paper, we present a novel application model granting parallel runtime systems system-wide control over the distribution of user-defined shared data structures. Our model utilizes the high-level nature of parallel programming languages, in particular, the usage of well-typed data structures and the associated hiding of implementation details from the application developers. By being based on a generalization of such data structures and extending the resulting abstraction with features facilitating the automated management of the distribution of those, our model enables runtime systems to dynamically influence the placement and replication of shared data. This paper covers a rigorous formal description of our application model, as well as details on our prototype implementation and experimental results demonstrating its ability to efficiently and scalably manage various data structures in real-world environments.

[1]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1986, PODC '86.

[2]  Timothy G. Mattson,et al.  The Parallel Research Kernels , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[3]  Sandro Bartolini,et al.  PHAST Library — Enabling Single-Source and High Performance Code for GPUs and Multi-cores , 2017, 2017 International Conference on High Performance Computing & Simulation (HPCS).

[4]  Kunle Olukotun,et al.  A Heterogeneous Parallel Framework for Domain-Specific Languages , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[5]  Thomas Fahringer,et al.  The AllScale API , 2019, 2019 15th International Conference on eScience (eScience).

[6]  Hartmut Kaiser,et al.  HPX: A Task Based Programming Model in a Global Address Space , 2014, PGAS.

[7]  Lee Margetts,et al.  Fortran 2008 coarrays , 2015, FORF.

[8]  Richard D. Hornung,et al.  The RAJA Portability Layer: Overview and Status , 2014 .

[9]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[10]  Stefano Markidis,et al.  Multi-scale simulations of plasma with iPIC3D , 2010, Math. Comput. Simul..

[11]  Tarek A. El-Ghazawi,et al.  UPC: unified parallel C , 2006, SC.

[12]  Thomas Fahringer,et al.  A Context-Aware Primitive for Nested Recursive Parallelism , 2016, Euro-Par Workshops.

[13]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[14]  Michel Steuwer,et al.  LIFT: A functional data-parallel IR for high-performance GPU code generation , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[15]  Nancy M. Amato,et al.  STAPL: An Adaptive, Generic Parallel C++ Library , 2001, LCPC.

[16]  Thomas Fahringer,et al.  The AllScale Runtime Application Model ( incl . Appendix ) , 2018 .

[17]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[18]  Daniel Sunderland,et al.  Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..

[19]  Andrew W. Moore,et al.  'N-Body' Problems in Statistical Learning , 2000, NIPS.

[20]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.