Exploring Application Performance on Emerging Hybrid-Memory Supercomputers

Next-generation supercomputers will feature more hierarchical and heterogeneous memory systems with different memory technologies working side-by-side. A critical question is whether at large scale existing HPC applications and emerging data-analytics workloads will have performance improvement or degradation on these systems. We propose a systematic and fair methodology to identify the trend of application performance on emerging hybrid-memory systems. We model the memory system of next-generation supercomputers as a combination of "fast" and "slow" memories. We then analyze performance and dynamic execution characteristics of a variety of workloads, from traditional scientific applications to emerging data analytics to compare traditional and hybrid-memory systems. Our results show that data analytics applications can clearly benefit from the new system design, especially at large scale. Moreover, hybrid-memory systems do not penalize traditional scientific applications, which may also show performance improvement.

[1]  K QureshiMoinuddin,et al.  Scalable high performance main memory system using phase-change memory technology , 2009 .

[2]  Michael Butler,et al.  Bulldozer: An Approach to Multithreaded Compute Performance , 2011, IEEE Micro.

[3]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[4]  Jaejin Lee,et al.  25.2 A 1.2V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[5]  Dong Li,et al.  Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[6]  Jeffrey S. Vetter,et al.  Algorithm-Directed Data Placement in Explicitly Managed Non-Volatile Memory , 2016, HPDC.

[7]  Tao Li,et al.  Exploring Phase Change Memory and 3D Die-Stacking for Power/Thermal Friendly, Fast and Durable Memory Architectures , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[8]  Maya Gokhale,et al.  DI-MMAP—a scalable memory-map runtime for out-of-core data-intensive applications , 2015, Cluster Computing.

[9]  Maya Gokhale,et al.  On the Role of NVRAM in Data-intensive Architectures: An Evaluation , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[10]  John Shalf,et al.  Design of a large-scale storage-class RRAM system , 2013, ICS '13.

[11]  Mahmut T. Kandemir,et al.  Evaluating STT-RAM as an energy-efficient main memory alternative , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[12]  Arun Jagatheesan,et al.  Understanding the Impact of Emerging Non-Volatile Memories on High-Performance, IO-Intensive Computing , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Yen-Chen Liu,et al.  Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.

[14]  J. Thomas Pawlowski,et al.  Hybrid memory cube (HMC) , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).

[15]  Ricardo Bianchini,et al.  Page placement in hybrid memory systems , 2011, ICS '11.

[16]  Gokcen Kestor,et al.  Quantifying the energy cost of data movement in scientific applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[17]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[18]  S. Ethier,et al.  Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms , 2005 .