Performance Modeling under Resource Constraints Using Deep Transfer Learning

Tuning application parameters for optimal performance is a challenging combinatorial problem. Hence, techniques for modeling the functional relationships between various input features in the parameter space and application performance are important. We show that simple statistical inference techniques are inadequate to capture these relationships. Even with more complex ensembles of models, the minimum coverage of the parameter space required via experimental observations is still quite large. We propose a deep learning based approach that can combine information from exhaustive observations collected at a smaller scale with limited observations collected at a larger target scale. The proposed approach is able to accurately predict performance in the regimes of interest to performance analysts while outperforming many traditional techniques. In particular, our approach can identify the best performing configurations even when trained using as few as 1% of observations at the target scale.

[1]  Edmond Chow,et al.  Parallel Implementation and Practical Use of Sparse Approximate Inverse Preconditioners with a Priori Sparsity Patterns , 2001, Int. J. High Perform. Comput. Appl..

[2]  Laxmikant V. Kalé,et al.  Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  Elizabeth R. Jessup,et al.  A Technique for Accelerating the Convergence of Restarted GMRES , 2005, SIAM J. Matrix Anal. Appl..

[4]  Frank Mueller,et al.  Power tuning HPC jobs on power-constrained systems , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[5]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[6]  Robert D. Falgout,et al.  hypre: A Library of High Performance Preconditioners , 2002, International Conference on Computational Science.

[7]  Prasanna Balaprakash,et al.  AutoMOMML: Automatic Multi-objective Modeling with Machine Learning , 2016, ISC.

[8]  Hans De Sterck,et al.  Reducing Complexity in Parallel Algebraic Multigrid Preconditioners , 2004, SIAM J. Matrix Anal. Appl..

[9]  Prasanna Balaprakash,et al.  Exploiting Performance Portability in Search Algorithms for Autotuning , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[10]  Fuat Keceli,et al.  Global Extensible Open Power Manager: A Vehicle for HPC Community Collaboration on Co-Designed Energy Management Solutions , 2017, ISC.

[11]  Abhinav Bhatele,et al.  LibPowerMon: A Lightweight Profiling Framework to Profile Program Context and System-Level Metrics , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[12]  Henry Hoffmann,et al.  Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques , 2016, ASPLOS.

[13]  Edmond Chow,et al.  An unstructured multigrid method based on geometric smoothness , 2003, Numer. Linear Algebra Appl..

[14]  Prasanna Balaprakash,et al.  Active-learning-based surrogate models for empirical performance tuning , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[15]  Martin Schulz,et al.  A Run-Time System for Power-Constrained HPC Applications , 2015, ISC.

[16]  Simon McIntosh-Smith,et al.  Improving Auto-Tuning Convergence Times with Dynamically Generated Predictive Performance Models , 2015, 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip.

[17]  Weichung Wang,et al.  Surrogate-assisted tuning for computer experiments with qualitative and quantitative parameters , 2018 .

[18]  Anne C. Elster,et al.  Machine learning‐based auto‐tuning for enhanced performance portability of OpenCL applications , 2017, Concurr. Comput. Pract. Exp..

[19]  Robert D. Falgout,et al.  Multigrid Smoothers for Ultra-Parallel Computing , 2011 .

[20]  Bronis R. de Supinski,et al.  Prediction models for multi-dimensional power-performance optimization on many cores , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[21]  Yousef Saad,et al.  A Flexible Inner-Outer Preconditioned GMRES Algorithm , 1993, SIAM J. Sci. Comput..

[22]  Kalyan Veeramachaneni,et al.  Autotuning algorithmic choice for input sensitivity , 2015, PLDI.

[23]  Robert D. Falgout,et al.  Multigrid Smoothers for Ultraparallel Computing , 2011, SIAM J. Sci. Comput..

[24]  Michael Garland,et al.  Nitro: A Framework for Adaptive Code Variant Tuning , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[25]  Sven Apel,et al.  Performance Prediction of Multigrid-Solver Configurations , 2016, Software for Exascale Computing.

[26]  Zheng Wang,et al.  Fast Automatic Heuristic Construction Using Active Learning , 2014, LCPC.

[27]  Peter N. Brown,et al.  KRIPKE - A MASSIVELY PARALLEL TRANSPORT MINI-APP , 2015 .

[28]  David D. Cox,et al.  Machine learning for predictive auto-tuning with boosted regression trees , 2012, 2012 Innovative Parallel Computing (InPar).

[29]  V. E. Henson,et al.  BoomerAMG: a parallel algebraic multigrid solver and preconditioner , 2002 .

[30]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[31]  Bronis R. de Supinski,et al.  Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[32]  Martin Schulz,et al.  Exploring hardware overprovisioning in power-constrained, high performance computing , 2013, ICS '13.