PADDLE: Performance Analysis Using a Data-Driven Learning Environment

The use of machine learning techniques to model execution time and power consumption, and, more generally, to characterize performance data is gaining traction in the HPC community. Although this signifies huge potential for automating complex inference tasks, a typical analytics pipeline requires selecting and extensively tuning multiple components ranging from feature learning to statistical inferencing to visualization. Further, the algorithmic solutions often do not generalize between problems, thereby making it cumbersome to design and validate machine learning techniques in practice. In order to address these challenges, we propose a unified machine learning framework, PADDLE, which is specifically designed for problems encountered during analysis of HPC data. The proposed framework uses an information-theoretic approach for hierarchical feature learning and can produce highly robust and interpretable models. We present user-centric workflows for using PADDLE and demonstrate its effectiveness in different scenarios: (a) identifying causes of network congestion; (b) determining the best performing linear solver for sparse matrices; and (c) comparing performance characteristics of parent and proxy application pairs.

[1]  Laxmikant V. Kalé,et al.  Identifying the Culprits Behind Network Congestion , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[2]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[3]  Tamara G. Kolda,et al.  An overview of the Trilinos project , 2005, TOMS.

[4]  Bernd Hamann,et al.  Mapping applications with collectives over sub-communicators on torus networks , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  R. Campbell,et al.  Automated Fingerprinting of Performance Pathologies Using Performance Monitoring Units ( PMUs ) , 2011 .

[6]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[7]  Aram Galstyan,et al.  The Information Sieve , 2015, ICML.

[8]  Brandon M. Malone,et al.  A Learning-based Selection for Portfolio Scheduling of Scientific Applications on Heterogeneous Computing Systems , 2014, CloudCom 2014.

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[11]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Ignacio Laguna,et al.  Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[13]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[14]  Benoit Forget,et al.  The OpenMC Monte Carlo particle transport code , 2012 .

[15]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[16]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[17]  Elizabeth R. Jessup,et al.  Lighthouse: A User-Centered Web Service for Linear Algebra Software , 2014, ArXiv.

[18]  Martin Schulz,et al.  A Machine Learning Framework for Performance Coverage Analysis of Proxy Applications , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  Greg Bronevetsky,et al.  Data-Driven Performance Modeling of Linear Solvers for Sparse Matrices , 2016, 2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).

[20]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[21]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Allen D. Malony,et al.  PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[24]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[25]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[26]  Aram Galstyan,et al.  Sifting Common Information from Many Variables , 2016, IJCAI.