User Extensible Heap Manager for Heterogeneous Memory Platforms and Mixed Memory Policies

Memory management software requires additional sophistication for the array of new hardware technologies coming to market: on package addressable memory, stacked DRAM, nonvolatile high capacity DIMMs, and low-latency on-package fabric. As a complement to these hardware improvements there are many policy features that can be applied to virtual memory within the framework of the Linux system calls mmap(2), mbind(2), madvise(2), mprotect(2), and mlock(2). These policy features can support a wide range of future hardware capabilities including bandwidth control, latency control, inter-process sharing, inter-node sharing, accelerator sharing, persistence, checkpointing, and encryption. The combinatorial range implied by a platform with heterogeneous memory hardware, and many options for operating system policies applied to that hardware is enormous, so it is intractable to have a separate custom allocator addressing each of them. Each layer of the application software stack may have a variety of different requirements for memory properties. Some of those properties will be shared between clients, and some will be unique to the client. We propose software that will enable fine-grained client control over memory properties through our User Extensible Heap Manager, which efficiently reuses memory modified by expensive system calls and remains effective in a highly threaded environment.

[1]  David Roberts,et al.  Toward Efficient Programmer-Managed Two-Level Memory Hierarchies in Exascale Computers , 2014, 2014 Hardware-Software Co-Design for High Performance Computing.

[2]  Amit Singh,et al.  Mac OS X Internals: A Systems Approach , 2006 .

[3]  Robert J. Harrison,et al.  Global Arrays: a portable "shared-memory" programming model for distributed memory computers , 1994, Proceedings of Supercomputing '94.

[4]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[5]  Rolf Riesen,et al.  CONCURRENCY AND COMPUTATION : PRACTICE AND EXPERIENCE Concurrency Computat , 2008 .

[6]  Rolf Riesen,et al.  mOS: an architecture for extreme-scale operating systems , 2014, ROSS@ICS.

[7]  Daniel Sunderland,et al.  Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..

[8]  Sandeep Koranne,et al.  Boost C++ Libraries , 2011 .

[9]  Jason Evans April A Scalable Concurrent malloc(3) Implementation for FreeBSD , 2006 .

[10]  Russ Rew,et al.  NetCDF: an interface for scientific data access , 1990, IEEE Computer Graphics and Applications.

[11]  Kathryn S. McKinley,et al.  Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[12]  Tamara G. Kolda,et al.  An overview of the Trilinos project , 2005, TOMS.

[13]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[14]  William Gropp,et al.  Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries , 1997, SciTools.

[15]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[16]  Kathryn S. McKinley,et al.  Reconsidering custom memory allocation , 2002, OOPSLA '02.

[17]  Kathryn S. McKinley,et al.  Composing high-performance memory allocators , 2001, PLDI '01.

[18]  David Roberts,et al.  Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[19]  Barbara M. Chapman,et al.  Extending the OpenSHMEM Memory Model to Support User-Defined Spaces , 2014, PGAS.

[20]  Katherine Yelick,et al.  Introduction to UPC and Language Specification , 2000 .

[21]  Gerd Heber,et al.  An overview of the HDF5 technology suite and its applications , 2011, AD '11.

[22]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.