Assessing the Performance Impact of using an Active Global Address Space in HPX: A Case for AGAS

In this research, we describe the functionality of AGAS (Active Global Address Space), a subsystem of the HPX runtime system that is designed to handle data locality at runtime, independent of the hardware and architecture configuration. AGAS enables transparent runtime global data access and data migration, but incurs a an overhead cost at runtime. We present a method to assess the performance of AGAS and the amount of impact it has on the execution time of the Octo-Tiger application. With our assessment method we identify the four most expensive AGAS operations in HPX and demonstrate that the overhead caused by AGAS is negligible.

[1]  Jeanine Cook,et al.  The Performance Implication of Task Size for Applications on the HPX Runtime System , 2015, 2015 IEEE International Conference on Cluster Computing.

[2]  Bradford L. Chamberlain,et al.  Using the High Productivity Language Chapel to Target GPGPU Architectures , 2011 .

[3]  Georg Hager,et al.  Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[4]  Sreeram Potluri,et al.  Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects , 2014 .

[5]  Dirk Pflüger,et al.  From piz daint to the stars: simulation of stellar mergers using high-level abstractions , 2019, SC.

[6]  Kundan Kadam,et al.  Numerical Simulations of Close and Contact Binary Systems Having Bipolytropic Equation of State , 2017 .

[7]  Chirag Dekate,et al.  Extreme scale parallel NBody algorithm with event driven constraint based execution model , 2011 .

[8]  Alexander Aiken,et al.  Regent: a high-productivity programming language for HPC with logical regions , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Katherine A. Yelick,et al.  UPC++: A PGAS Extension for C++ , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[10]  Thomas L. Sterling,et al.  A High-Performance Computing Forecast: Partly Cloudy , 2009, Computing in Science & Engineering.

[11]  Allen D. Malony,et al.  An Autonomic Performance Environment for Exascale , 2015, Supercomput. Front. Innov..

[12]  Lena Oden,et al.  GPI2 for GPUs: A PGAS framework for efficient communication in hybrid clusters , 2013, PARCO.

[13]  Dirk Pflüger,et al.  Harnessing billions of tasks for a scalable portable hydrodynamic simulation of the merger of two stars , 2019, Int. J. High Perform. Comput. Appl..

[14]  Hartmut Kaiser,et al.  HPX: A Task Based Programming Model in a Global Address Space , 2014, PGAS.

[15]  Hartmut Kaiser,et al.  Methodology for Adaptive Active Message Coalescing in Task Based Runtime Systems , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[16]  Thomas L. Sterling,et al.  ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications , 2009, 2009 International Conference on Parallel Processing Workshops.

[17]  Thomas L. Sterling,et al.  Adaptive Mesh Refinement for Astrophysics Applications with ParalleX , 2011, ArXiv.

[18]  Barbara M. Chapman,et al.  Introducing OpenSHMEM: SHMEM for the PGAS community , 2010, PGAS '10.

[19]  Sreedhar B. Kodali,et al.  The Asynchronous Partitioned Global Address Space Model , 2010 .

[20]  Vivek Sarkar,et al.  Software challenges in extreme scale systems , 2009 .

[21]  Chao-Tung Yang,et al.  Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters , 2011, Comput. Phys. Commun..