Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models

As parallel applications become more complex, auto-tuning becomes more desirable, challenging, and time-consuming. We propose, Bliss, a novel solution for auto-tuning parallel applications without requiring apriori information about applications, domain-specific knowledge, or instrumentation. Bliss demonstrates how to leverage a pool of Bayesian Optimization models to find the near-optimal parameter setting 1.64× faster than the state-of-the-art approaches.

[1]  Cynthia Rudin,et al.  This Looks Like That: Deep Learning for Interpretable Image Recognition , 2018 .

[2]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[3]  José Miguel Hernández-Lobato Designing Neural Network Hardware Accelerators with Decoupled Objective Evaluations , 2016 .

[4]  Valentin Dalibard,et al.  BOAT: Building Auto-Tuners with Structured Bayesian Optimization , 2017, WWW.

[5]  Peng Zhang,et al.  Auto-tuning Streamed Applications on Intel Xeon Phi , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[6]  Prasanna Balaprakash,et al.  A Framework for Enabling OpenMP Autotuning , 2019, IWOMP.

[7]  Hui Guo,et al.  pLiner: isolating lines of floating-point code for compiler-induced variability , 2020, SC.

[8]  Santu Rana,et al.  Bayesian Optimization for Categorical and Category-Specific Continuous Inputs , 2019, AAAI.

[9]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[10]  Prasanna Balaprakash,et al.  Exploiting Performance Portability in Search Algorithms for Autotuning , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[11]  Prasanna Balaprakash,et al.  Active-learning-based surrogate models for empirical performance tuning , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[12]  James Demmel DEGAS: Dynamic Exascale Global Address Space Programming Environments , 2018 .

[13]  L. Shapley,et al.  The Shapley Value , 1994 .

[14]  James Demmel,et al.  Multitask and Transfer Learning for Autotuning Exascale Applications , 2019, ArXiv.

[15]  Chun Chen,et al.  A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[16]  Gu-Yeon Wei,et al.  A case for efficient accelerator design space exploration via Bayesian optimization , 2017, 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[17]  Prasanna Balaprakash,et al.  Autotuning in High-Performance Computing Applications , 2018, Proceedings of the IEEE.

[18]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[19]  James Demmel,et al.  GPTune: multitask learning for autotuning exascale applications , 2021, PPoPP.

[20]  Kalyan Veeramachaneni,et al.  Autotuning algorithmic choice for input sensitivity , 2015, PLDI.

[21]  Ananta Tiwari,et al.  Online Adaptive Code Generation and Tuning , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[22]  I-Hsin Chung,et al.  A Case Study Using Automatic Performance Tuning for Large-Scale Scientific Programs , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[23]  Peter N. Brown,et al.  KRIPKE - A MASSIVELY PARALLEL TRANSPORT MINI-APP , 2015 .

[24]  Ninghui Sun,et al.  SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication , 2013, PLDI.

[25]  Saurabh Bagchi,et al.  Rafiki: a middleware for parameter tuning of NoSQL datastores for dynamic metagenomics workloads , 2017, Middleware.

[26]  Ian Foster,et al.  ParaOpt: Automated Application Parameterization and Optimization for the Cloud , 2019, 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom).

[27]  Pavlos Petoumenos,et al.  Minimizing the cost of iterative compilation with active learning , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[28]  David Gregg,et al.  Optimal DNN primitive selection with partitioned boolean quadratic programming , 2018, CGO.

[29]  Thomas Fahringer,et al.  Multi-Objective Auto-Tuning with Insieme: Optimization and Trade-Off Analysis for Time, Energy and Resource Usage , 2014, Euro-Par.

[30]  Una-May O'Reilly,et al.  An efficient evolutionary algorithm for solving incrementally structured problems , 2011, GECCO '11.

[31]  Michael F. P. O'Boyle,et al.  Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments , 2015, PLDI.

[32]  Frank Mueller,et al.  FuncyTuner: Auto-tuning Scientific Applications With Per-loop Compilation , 2019, ICPP.

[33]  Vivek Sarkar DDARING: Dynamic Data Aware Reconfiguration, Integration and Generation , 2020 .

[34]  Bernhard Egger,et al.  Auto-Tuning CNNs for Coarse-Grained Reconfigurable Array-Based Accelerators , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[35]  Jeffrey K. Hollingsworth,et al.  ANGEL: A Hierarchical Approach to Multi-Objective Online Auto-Tuning , 2015, ROSS@HPDC.

[36]  Helmar Burkhart,et al.  PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[37]  Luca Benini,et al.  The ANTAREX approach to autotuning and adaptivity for energy efficient HPC systems , 2016, Conf. Computing Frontiers.

[38]  I-Hsin Chung,et al.  Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[39]  Henry Hoffmann,et al.  Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques , 2016, ASPLOS.

[40]  Michael A. Osborne,et al.  Gaussian Processes for Global Optimization , 2008 .

[41]  Michael Ott,et al.  Automatic performance analysis with periscope , 2010, Concurr. Comput. Pract. Exp..

[42]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[43]  Anne C. Elster,et al.  Machine Learning Based Auto-Tuning for Enhanced OpenCL Performance Portability , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[44]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[45]  Naga K. Govindaraju,et al.  Auto-tuning of fast fourier transform on graphics processors , 2011, PPoPP '11.

[46]  Lieven Eeckhout,et al.  RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's Configuration , 2016, IEEE Transactions on Parallel and Distributed Systems.

[47]  Ananta Tiwari,et al.  End-to-End Auto-Tuning with Active Harmony , 2010 .

[48]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[49]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[50]  Ian Karlin,et al.  LULESH 2.0 Updates and Changes , 2013 .

[51]  Ignacio Laguna,et al.  Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[52]  Jeffrey K. Hollingsworth,et al.  Prediction and adaptation in Active Harmony , 2004, Cluster Computing.

[53]  Robert D. Falgout,et al.  hypre: A Library of High Performance Preconditioners , 2002, International Conference on Computational Science.

[54]  Wen-mei W. Hwu,et al.  Analytical Performance Prediction for Evaluation and Tuning of GPGPU Applications , 2009 .

[55]  Hal Finkel,et al.  Quantitative Performance Assessment of Proxy Apps and Parents , 2018 .

[56]  Robert Ricci,et al.  Active Learning in Performance Analysis , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[57]  Shoaib Ashraf Kamil,et al.  Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages , 2012 .

[58]  Vivek Sarkar,et al.  Declarative Tuning for Locality in Parallel Programs , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[59]  Vivek Sarkar,et al.  Software challenges in extreme scale systems , 2009 .

[60]  Leonardo B. Oliveira,et al.  Prior-guided Bayesian Optimization , 2020, ArXiv.

[61]  Bronis R. de Supinski,et al.  CLOMP: Accurately Characterizing OpenMP Application Overheads , 2009, International Journal of Parallel Programming.

[62]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[63]  Yibo Zhu,et al.  A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.

[64]  Luca Benini,et al.  Autotuning and adaptivity approach for energy efficient Exascale HPC systems: The ANTAREX approach , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[65]  James Demmel,et al.  Precimonious: Tuning assistant for floating-point precision , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[66]  José M. F. Moura,et al.  Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Alogorithms , 2004, Int. J. High Perform. Comput. Appl..

[67]  Christian Kästner,et al.  Transfer Learning for Improving Model Predictions in Highly Configurable Software , 2017, 2017 IEEE/ACM 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS).

[68]  Lizy Kurian John Connectivity! Connectivity! Connectivity! May You Be More Connected Than Ever!! , 2020, IEEE Micro.

[69]  Prasanna Balaprakash,et al.  Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization , 2020, 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).

[70]  Tao Wang,et al.  Bootstrapping Parameter Space Exploration for Fast Tuning , 2018, ICS.

[71]  Milind Kulkarni Compiler and Runtime Approaches to Enable Large-Scale Irregular Programs , 2020 .

[72]  Luca Benini,et al.  Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems , 2018, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[73]  Thomas Fahringer,et al.  A multi-objective auto-tuning framework for parallel codes , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[74]  Chun Chen,et al.  Auto-tuning full applications: A case study , 2011, Int. J. High Perform. Comput. Appl..

[75]  Todd Gamblin,et al.  Auto-tuning Parameter Choices in HPC Applications using Bayesian Optimization , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[76]  Jack J. Dongarra,et al.  Performance, Design, and Autotuning of Batched GEMM for GPUs , 2016, ISC.

[77]  Mahmut T. Kandemir,et al.  Tuning garbage collection in an embedded Java environment , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.