Towards optimal scheduling policy for heterogeneous memory architecture in many-core system

With the advent of Intels second-generation many-core processor (Knights Landing: KNL), high-bandwidth memory (HBM) with potentially five times more bandwidth than existing dynamic random-access memory has become available as a valuable computing resource for high-performance computing (HPC) applications. Therefore, resource management schemes should now be able to consider existing central processing unit cores, conventional main memory, and this newly available HBM to improve the overall system throughput and user response time. In this paper, we present our profiling mechanism and related scheduling policy that analyzes the resource usage patterns of various HPC workloads. By carefully allocating memory-intensive workloads to HBM in KNL, we show that the overall performance of multiple message passing interface workloads can be improved in terms of the execution time and system utilization. We evaluate and verify the effectiveness of our scheme for optimizing the use of HBM by using NAS Parallel Benchmarks.

[1]  Vincent M. Weaver Self-monitoring overhead of the Linux perf_ event performance counter interface , 2015, 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[2]  Avinash Sodani,et al.  Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition , 2016 .

[3]  Shuo Li,et al.  Enhancing application performance using heterogeneous memory architectures on a many-core platform , 2016, 2016 International Conference on High Performance Computing & Simulation (HPCS).

[4]  Avinash Sodani,et al.  Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).

[5]  Seoyoung Kim,et al.  A Study on Optimal Scheduling Using High-Bandwidth Memory of Knights Landing Processor , 2017, 2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W).

[6]  Joseph Pasquale,et al.  A static and dynamic workload characterization study of the San Diego Supercomputer center Cray X-MP , 1991, SIGMETRICS '91.

[7]  J. Jeddeloh,et al.  Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).

[8]  Kent Milfeld,et al.  A Comparative Study of Application Performance and Scalability on the Intel Knights Landing Processor , 2016, ISC Workshops.

[9]  Gokcen Kestor,et al.  Exploring the Performance Benefit of Hybrid Memory System on HPC Environments , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[10]  Hao Wang,et al.  Exploring and Analyzing the Real Impact of Modern On-Package Memory on HPC Scientific Kernels , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  K. McMahon,et al.  Optimizing Cray MPI and SHMEM Software Stacks for Cray-XC Supercomputers based on Intel KNL Processors , 2016 .

[12]  Samuel Williams,et al.  Performance Tuning of Scientific Applications , 2010 .