论文信息 - SharP: Towards Programming Extreme-Scale Systems with Hierarchical Heterogeneous Memory

SharP: Towards Programming Extreme-Scale Systems with Hierarchical Heterogeneous Memory

The pre-exascale systems are expected to have a significant amount of hierarchical and heterogeneous on-node memory, and this trend of system architecture in extreme-scale systems is expected to continue into the exascale era. Along with hierarchical-heterogeneous memory, the system typically has a high-performing network and a compute accelerator. This system architecture is not only effective for running traditional High Performance Computing (HPC) applications (Big-Compute), but also running data-intensive HPC applications and Big-Data applications. As a consequence, there is a growing desire to have a single system serve the needs of both Big-Compute and Big-Data applications. Though the system architecture supports the convergence of the Big-Compute and Big-Data, the programming models have yet to evolve to support either hierarchical-heterogeneous memory systems or the convergence. In this work, we propose and develop the programming abstraction called SHARed data-structure centric Programming abstraction (SharP) to address both of these goals, i.e., provide (1) a simple, usable, and portable abstraction for hierarchical-heterogeneous memory and (2) a unified programming abstraction for Big-Compute and Big-Data applications. To evaluate SharP, we implement a Stencil benchmark using SharP, port QMCPack, a petascale-capable application, and adapt Memcached ecosystem, a popular Big-Data framework, to use SharP, and quantify the performance and productivity advantages. Additionally, we demonstrate the simplicity of using SharP on different memories including DRAM, High-bandwidth Memory (HBM), and non-volatile random access memory (NVRAM).

Manjunath Gorentla Venkata | Ferrol Aderholdt | Zachary W. Parchman

[1] Franck Cappello,et al. Fault Tolerance in Petascale/ Exascale Systems: Current Knowledge, Challenges and Research Opportunities , 2009, Int. J. High Perform. Comput. Appl..

[2] Pedro C. Diniz. Exascale Programming Challenges , 2011 .

[3] Robert J. Harrison,et al. Global Arrays: a portable "shared-memory" programming model for distributed memory computers , 1994, Proceedings of Supercomputing '94.

[4] Guillaume Mercier,et al. hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[5] Forum Mpi. MPI: A Message-Passing Interface , 1994 .

[6] Sayantan Sur,et al. Memcached Design on High Performance RDMA Capable Interconnects , 2011, 2011 International Conference on Parallel Processing.

[7] Robert W. Numrich,et al. Co-array Fortran for parallel programming , 1998, FORF.

[8] Dhabaleswar K. Panda,et al. High-Performance Design of Hadoop RPC with RDMA over InfiniBand , 2013, 2013 42nd International Conference on Parallel Processing.

[9] D. Quinlan,et al. Inter-Agency Workshop on HPC Resilience at Extreme Scale National Security Agency Advanced Computing Systems February 21 – 24 , 2012 Coordinating Representatives John Daly ( DOD ) Bill Harrod ( DOE / SC ) Thuc Hoang ( DOE / NNSA , 2012 .

[10] David M. Ceperley,et al. Hybrid algorithms in quantum Monte Carlo , 2012 .

[11] José Gracia,et al. DASH: Data Structures and Algorithms with Support for Hierarchical Locality , 2014, Euro-Par Workshops.

[12] Richard D. Hornung,et al. The RAJA Portability Layer: Overview and Status , 2014 .

[13] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.

[14] Sayantan Sur,et al. A Brief Introduction to the OpenFabrics Interfaces - A New Network API for Maximizing High Performance Application Efficiency , 2015, 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects.

[15] Daniel Sunderland,et al. Kokkos Array performance-portable manycore programming model , 2012, PMAM '12.