论文信息 - Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models

Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models

As parallel applications become more complex, auto-tuning becomes more desirable, challenging, and time-consuming. We propose, Bliss, a novel solution for auto-tuning parallel applications without requiring apriori information about applications, domain-specific knowledge, or instrumentation. Bliss demonstrates how to leverage a pool of Bayesian Optimization models to find the near-optimal parameter setting 1.64× faster than the state-of-the-art approaches.

[1] Cynthia Rudin,et al. This Looks Like That: Deep Learning for Interpretable Image Recognition , 2018 .

[2] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[3] José Miguel Hernández-Lobato. Designing Neural Network Hardware Accelerators with Decoupled Objective Evaluations , 2016 .

[4] Valentin Dalibard,et al. BOAT: Building Auto-Tuners with Structured Bayesian Optimization , 2017, WWW.

[5] Peng Zhang,et al. Auto-tuning Streamed Applications on Intel Xeon Phi , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[6] Prasanna Balaprakash,et al. A Framework for Enabling OpenMP Autotuning , 2019, IWOMP.

[7] Hui Guo,et al. pLiner: isolating lines of floating-point code for compiler-induced variability , 2020, SC.

[8] Santu Rana,et al. Bayesian Optimization for Categorical and Category-Specific Continuous Inputs , 2019, AAAI.

[9] Cynthia Rudin,et al. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[10] Prasanna Balaprakash,et al. Exploiting Performance Portability in Search Algorithms for Autotuning , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[11] Prasanna Balaprakash,et al. Active-learning-based surrogate models for empirical performance tuning , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[12] James Demmel. DEGAS: Dynamic Exascale Global Address Space Programming Environments , 2018 .

[13] L. Shapley,et al. The Shapley Value , 1994 .

[14] James Demmel,et al. Multitask and Transfer Learning for Autotuning Exascale Applications , 2019, ArXiv.

[15] Chun Chen,et al. A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[16] Gu-Yeon Wei,et al. A case for efficient accelerator design space exploration via Bayesian optimization , 2017, 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[17] Prasanna Balaprakash,et al. Autotuning in High-Performance Computing Applications , 2018, Proceedings of the IEEE.

[18] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[19] James Demmel,et al. GPTune: multitask learning for autotuning exascale applications , 2021, PPoPP.

[20] Kalyan Veeramachaneni,et al. Autotuning algorithmic choice for input sensitivity , 2015, PLDI.

[21] Ananta Tiwari,et al. Online Adaptive Code Generation and Tuning , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[22] I-Hsin Chung,et al. A Case Study Using Automatic Performance Tuning for Large-Scale Scientific Programs , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[23] Peter N. Brown,et al. KRIPKE - A MASSIVELY PARALLEL TRANSPORT MINI-APP , 2015 .

[24] Ninghui Sun,et al. SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication , 2013, PLDI.

[25] Saurabh Bagchi,et al. Rafiki: a middleware for parameter tuning of NoSQL datastores for dynamic metagenomics workloads , 2017, Middleware.

[26] Ian Foster,et al. ParaOpt: Automated Application Parameterization and Optimization for the Cloud , 2019, 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom).

[27] Pavlos Petoumenos,et al. Minimizing the cost of iterative compilation with active learning , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[28] David Gregg,et al. Optimal DNN primitive selection with partitioned boolean quadratic programming , 2018, CGO.

[29] Thomas Fahringer,et al. Multi-Objective Auto-Tuning with Insieme: Optimization and Trade-Off Analysis for Time, Energy and Resource Usage , 2014, Euro-Par.

[30] Una-May O'Reilly,et al. An efficient evolutionary algorithm for solving incrementally structured problems , 2011, GECCO '11.

[31] Michael F. P. O'Boyle,et al. Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments , 2015, PLDI.

[32] Frank Mueller,et al. FuncyTuner: Auto-tuning Scientific Applications With Per-loop Compilation , 2019, ICPP.

[33] Vivek Sarkar. DDARING: Dynamic Data Aware Reconfiguration, Integration and Generation , 2020 .

[34] Bernhard Egger,et al. Auto-Tuning CNNs for Coarse-Grained Reconfigurable Array-Based Accelerators , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[35] Jeffrey K. Hollingsworth,et al. ANGEL: A Hierarchical Approach to Multi-Objective Online Auto-Tuning , 2015, ROSS@HPDC.

[36] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[37] Luca Benini,et al. The ANTAREX approach to autotuning and adaptivity for energy efficient HPC systems , 2016, Conf. Computing Frontiers.

[38] I-Hsin Chung,et al. Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[39] Henry Hoffmann,et al. Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques , 2016, ASPLOS.

[40] Michael A. Osborne,et al. Gaussian Processes for Global Optimization , 2008 .

[41] Michael Ott,et al. Automatic performance analysis with periscope , 2010, Concurr. Comput. Pract. Exp..

[42] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[43] Anne C. Elster,et al. Machine Learning Based Auto-Tuning for Enhanced OpenCL Performance Portability , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[44] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[45] Naga K. Govindaraju,et al. Auto-tuning of fast fourier transform on graphics processors , 2011, PPoPP '11.

[46] Lieven Eeckhout,et al. RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's Configuration , 2016, IEEE Transactions on Parallel and Distributed Systems.

[47] Ananta Tiwari,et al. End-to-End Auto-Tuning with Active Harmony , 2010 .

[48] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[49] Nando de Freitas,et al. Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[50] Ian Karlin,et al. LULESH 2.0 Updates and Changes , 2013 .

[51] Ignacio Laguna,et al. Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[52] Jeffrey K. Hollingsworth,et al. Prediction and adaptation in Active Harmony , 2004, Cluster Computing.

[53] Robert D. Falgout,et al. hypre: A Library of High Performance Preconditioners , 2002, International Conference on Computational Science.

[54] Wen-mei W. Hwu,et al. Analytical Performance Prediction for Evaluation and Tuning of GPGPU Applications , 2009 .

[55] Hal Finkel,et al. Quantitative Performance Assessment of Proxy Apps and Parents , 2018 .

[56] Robert Ricci,et al. Active Learning in Performance Analysis , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[57] Shoaib Ashraf Kamil,et al. Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages , 2012 .

[58] Vivek Sarkar,et al. Declarative Tuning for Locality in Parallel Programs , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[59] Vivek Sarkar,et al. Software challenges in extreme scale systems , 2009 .

[60] Leonardo B. Oliveira,et al. Prior-guided Bayesian Optimization , 2020, ArXiv.

[61] Bronis R. de Supinski,et al. CLOMP: Accurately Characterizing OpenMP Application Overheads , 2009, International Journal of Parallel Programming.

[62] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[63] Yibo Zhu,et al. A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.

[64] Luca Benini,et al. Autotuning and adaptivity approach for energy efficient Exascale HPC systems: The ANTAREX approach , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[65] James Demmel,et al. Precimonious: Tuning assistant for floating-point precision , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[66] José M. F. Moura,et al. Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Alogorithms , 2004, Int. J. High Perform. Comput. Appl..

[67] Christian Kästner,et al. Transfer Learning for Improving Model Predictions in Highly Configurable Software , 2017, 2017 IEEE/ACM 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS).

[68] Lizy Kurian John. Connectivity! Connectivity! Connectivity! May You Be More Connected Than Ever!! , 2020, IEEE Micro.

[69] Prasanna Balaprakash,et al. Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization , 2020, 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).

[70] Tao Wang,et al. Bootstrapping Parameter Space Exploration for Fast Tuning , 2018, ICS.

[71] Milind Kulkarni. Compiler and Runtime Approaches to Enable Large-Scale Irregular Programs , 2020 .

[72] Luca Benini,et al. Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems , 2018, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[73] Thomas Fahringer,et al. A multi-objective auto-tuning framework for parallel codes , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[74] Chun Chen,et al. Auto-tuning full applications: A case study , 2011, Int. J. High Perform. Comput. Appl..

[75] Todd Gamblin,et al. Auto-tuning Parameter Choices in HPC Applications using Bayesian Optimization , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[76] Jack J. Dongarra,et al. Performance, Design, and Autotuning of Batched GEMM for GPUs , 2016, ISC.

[77] Mahmut T. Kandemir,et al. Tuning garbage collection in an embedded Java environment , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.