Interference-aware co-scheduling method based on classification of application characteristics from hardware performance counter using data mining

Computational scientists and engineers who are eager to obtain the best performance of scientific applications need efficient application characterization methods to successfully exploit high-performance hardware resources. However, modern processors are accompanied by high-bandwidth on-chip memory or a large number of cores. Therefore, application characterization research that takes into account the newly introduced hardware features in next-generation high performance computing environments is insufficient and complex. In this paper, we propose a simple and fast method to classify the application characteristics in systems state-of-the-art processors using hardware performance counters. The proposed method utilizes hardware performance counters to monitor hardware events related to system performance. A clustering approach is adopted that requires limited understanding of the correlation between hardware events and application characteristics. The application characterization technique is applied to NAS parallel benchmarks in two systems, including Intel Knights Landing and SkyLake Xeon processors. We demonstrate that the proposed techniques can capture system and application characteristics and provide users with useful insights into application execution.

[1]  Ralf Gruber,et al.  One Joule per GFlop for BLAS2 Now , 2010 .

[2]  Jacob Benesty,et al.  Pearson Correlation Coefficient , 2009 .

[3]  Chen Feng,et al.  Performance Characterization of Hadoop and Data MPI Based on Amdahl's Second Law , 2014, 2014 9th IEEE International Conference on Networking, Architecture, and Storage.

[4]  Jieun Choi,et al.  Efficient Classification of Application Characteristics by Using Hardware Performance Counters with Data Mining , 2018, 2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W).

[5]  Steven M. Gallo,et al.  Workload Analysis of Blue Waters , 2017, ArXiv.

[6]  Ariel Oleksiak,et al.  Top-Down Characterization Approximation based on performance counters architecture for AMD processors , 2016, Simul. Model. Pract. Theory.

[7]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[8]  Wolfgang E. Nagel,et al.  Detecting Memory-Boundedness with Hardware Performance Counters , 2017, ICPE.

[9]  Jik-Soo Kim,et al.  Towards optimal scheduling policy for heterogeneous memory architecture in many-core system , 2018, Cluster Computing.

[10]  Dong Li,et al.  Application Characterization Using Oxbow Toolkit and PADS Infrastructure , 2014, 2014 Hardware-Software Co-Design for High Performance Computing.

[11]  Renato J. O. Figueiredo,et al.  Application classification through monitoring and learning of resource consumption patterns , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[12]  Avinash Sodani,et al.  Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition , 2016 .

[13]  Hyun-Wook Jin,et al.  Enhanced memory management for scalable MPI intra-node communication on many-core processor , 2017, EuroMPI/USA.

[14]  Hui Wang,et al.  A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters , 2015, VEE 2015.

[15]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[16]  Jeanine Cook,et al.  Improved estimation for software multiplexing of performance counters , 2005, 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[17]  Josef Weidendorfer,et al.  Case Study on Co-scheduling for HPC Applications , 2015, 2015 44th International Conference on Parallel Processing Workshops.

[18]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[19]  Robert Schöne,et al.  On-line analysis of hardware performance events for workload characterization and processor frequency scaling decisions , 2011, ICPE '11.

[20]  James E. Smith,et al.  A performance counter architecture for computing accurate CPI components , 2006, ASPLOS XII.

[21]  Jean-Marc Pierson,et al.  Characterizing Applications from Power Consumption: A Case Study for HPC Benchmarks , 2011, ICT-GLOW.

[22]  Rob VanderWijngaart,et al.  NAS Parallel Benchmarks I/O Version 2.4. 2.4 , 2002 .

[23]  Ananta Tiwari,et al.  Compute bottlenecks on the new 64-bit ARM , 2015, E2SC '15.