HPX – An open source C++ Standard Library for Parallelism and Concurrency

To achieve scalability with today’s heterogeneous HPC resources, we need a dramatic shift in our thinking; MPI+X is not enough. Asynchronous Many Task (AMT) runtime systems break down the global barriers imposed by the Bulk Synchronous Programming model. HPX is an open-source, C++ Standards compliant AMT runtime system that is developed by a diverse international community of collaborators called The Ste| |ar Group. HPX provides features which allow application developers to naturally use key design patterns, such as overlapping communication and computation, decentralizing of control flow, oversubscribing execution resources and sending work to data instead of data to work. The Ste| |ar Group is comprised of physicists, engineers, and computer scientists; men and women from many different institutions and affiliations, and over a dozen different countries. We are committed to advancing the development of scalable parallel applications by providing a platform for collaborating and exchanging ideas. In this paper, we give a detailed description of the features HPX provides and how they help achieve scalability and programmability, a list of applications of HPX including two large NSF funded collaborations (STORM, for storm surge forecasting; and STAR (OctoTiger) an astrophysics project which runs at 96.8% parallel efficiency on 643,280 cores), and we end with a description of how HPX and the Ste| |ar Group fit into the open source community.

[1]  Daniel P. Friedman,et al.  CONS Should Not Evaluate its Arguments , 1976, ICALP.

[2]  Dietmar Fey,et al.  High Performance Computing , 2016, Lecture Notes in Computer Science.

[3]  Thomas L. Sterling,et al.  ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications , 2009, 2009 International Conference on Parallel Processing Workshops.

[4]  Rajeev Thakur,et al.  Enabling MPI interoperability through flexible communication endpoints , 2013, EuroMPI.

[5]  Dietmar Fey,et al.  Using HPX and LibGeoDecomp for scaling HPC applications on heterogeneous supercomputers , 2013, ScalA '13.

[6]  Jeanine Cook,et al.  Using Intrinsic Performance Counters to Assess Efficiency in Task-Based Parallel Applications , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[7]  Dietmar Fey,et al.  Higher-level parallelization for local and distributed asynchronous task-based programming , 2015, ESPM '15.

[8]  Philip Ross,et al.  Why CPU Frequency Stalled , 2008, IEEE Spectrum.

[9]  Norman W. Scheffner,et al.  ADCIRC: An Advanced Three-Dimensional Circulation Model for Shelves, Coasts, and Estuaries. Report 1. Theory and Methodology of ADCIRC-2DDI and ADCIRC-3DL. , 1992 .

[10]  David A. Bader Evolving MPI+X Toward Exascale , 2016, Computer.

[11]  Hartmut Kaiser,et al.  Extending C++ with co-array semantics , 2016, ARRAY@PLDI.

[12]  Carl Hewitt,et al.  The incremental garbage collection of processes , 1977 .

[13]  Herb Sutter,et al.  The Free Lunch Is Over A Fundamental Turn Toward Concurrency in Software , 2013 .

[14]  Dirk Pflüger,et al.  Harnessing billions of tasks for a scalable portable hydrodynamic simulation of the merger of two stars , 2019, Int. J. High Perform. Comput. Appl..

[15]  Hartmut Kaiser,et al.  HPX: A Task Based Programming Model in a Global Address Space , 2014, PGAS.

[16]  Jack B. Dennis,et al.  Data Flow Supercomputers , 1980, Computer.

[17]  Hartmut Kaiser,et al.  A Non-intrusive Technique for Interfacing Legacy Fortran Codes with Modern C++ Runtime Systems , 2015, 2015 Third International Symposium on Computing and Networking (CANDAR).

[18]  Hartmut Kaiser,et al.  Using SYCL as an Implementation Framework for HPX.Compute , 2017, IWOCL.

[19]  Jack B. Dennis,et al.  A preliminary architecture for a basic data-flow processor , 1974, ISCA '98.