Runtime Techniques for Programming with Fast and Slow Memory

The increase in memory capacity is substantially behind the increase in computing power in today's supercomputers. In order to alleviate the effect of this gap, diverse options such as NVM - non-volatile memory (less expensive but slow) and HBM - high bandwidth memory (fast but expensive) are being explored. In this paper, we present a common approach using parallel runtime techniques for utilizing NVM and HBM as extensions of the existing memory hierarchy. We evaluate our approach using matrix-matrix multiplication kernel implemented in CHARM++ and show that applications with memory requirement four times the HBM/DRAM capacity can be executed efficiently using significantly less total resources.

[1]  Abhishek Gupta,et al.  Parallel Programming with Migratable Objects: Charm++ in Practice , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Maya Gokhale,et al.  On the Role of NVRAM in Data-intensive Architectures: An Evaluation , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[3]  Gabriel Kliot,et al.  Streaming graph partitioning for large distributed graphs , 2012, KDD.