Multi-Parameter Performance Modeling Based on Machine Learning with Basic Block Features

Considering the increasing complexity and scale of HPC architecture and software, the performance modeling of parallel applications on large-scale HPC platforms has become increasingly important. It plays an important role in many areas, such as performance analysis, job management, and resource estimation. In this work, we propose a multi-parameter performance modeling and prediction framework called MPerfPred, which utilizes basic block frequencies as features and uses machine learning algorithms to automatically construct multi-parameter performance models with high generalization ability. To reduce the prediction overhead, we propose some feature-filtering strategies to reduce the number of features in the training stage and build a serial program called BBF collector for each target application to quickly collect feature values in the prediction stage. We demonstrate the use of MPerfPred on the TianHe-2 supercomputer with six parallel applications. Results show that MPerfPred with SVR achieves better prediction than other input parameter-based modeling methods. The average prediction error and average standard deviation of prediction errors of MPerfPred are 8.42% and 6.09%, respectively. In the prediction stage, the average prediction overhead of MPerfPred is less than 0.13% of the total execution time.

[1]  Henri Casanova,et al.  SimGrid: A Generic Framework for Large-Scale Distributed Experiments , 2008, Tenth International Conference on Computer Modeling and Simulation (uksim 2008).

[2]  Weizhe Zhang,et al.  PROFPRED: A Compiler-Level IR Based Performance Prediction Framework for MPI Industrial Applications , 2019, 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[3]  Torsten Hoefler,et al.  PEMOGEN: Automatic adaptive performance modeling during program runtime , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[4]  Torsten Hoefler,et al.  Performance modeling for systematic performance tuning , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  Thomas Lippert,et al.  Trends in supercomputing: The European path to exascale , 2011, Comput. Phys. Commun..

[6]  Wenguang Chen,et al.  Performance Prediction for Large-Scale Parallel Applications Using Representative Replay , 2016, IEEE Transactions on Computers.

[7]  R. Hornung,et al.  HYDRODYNAMICS CHALLENGE PROBLEM , 2011 .

[8]  Xiaohan Ma,et al.  Statistical Power Consumption Analysis and Modeling for GPU-based Computing , 2011 .

[9]  William Gropp,et al.  Learning with Analytical Models , 2018, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[10]  James R. Larus,et al.  Optimally profiling and tracing programs , 1992, POPL '92.

[11]  Hui He,et al.  Performance modeling for MPI applications with low overhead fine-grained profiling , 2019, Future Gener. Comput. Syst..

[12]  Laurence T. Yang,et al.  Automatic generation of benchmarks for I/O-intensive parallel applications , 2019, J. Parallel Distributed Comput..

[13]  Sally A. McKee,et al.  Methods of inference and learning for performance modeling of parallel applications , 2007, PPoPP.

[14]  Zhiling Lan,et al.  Trade-Off Between Prediction Accuracy and Underestimation Rate in Job Runtime Estimates , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[15]  Weizhe Zhang,et al.  DwarfCode: A Performance Prediction Tool for Parallel Applications , 2016, IEEE Transactions on Computers.

[16]  Denis Trystram,et al.  Improving backfilling by using machine learning to predict running times , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  Torsten Hoefler,et al.  Using automated performance modeling to find scalability bugs in complex codes , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[18]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[19]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[20]  Adolfy Hoisie,et al.  Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications , 2000, Int. J. High Perform. Comput. Appl..

[21]  Keith D. Cooper,et al.  Engineering a Compiler , 2003 .

[22]  Kevin Leyton-Brown,et al.  Algorithm runtime prediction: Methods & evaluation , 2012, Artif. Intell..

[23]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[24]  Martin Schulz,et al.  A regression-based approach to scalability prediction , 2008, ICS '08.