Asynchronous Distributed Bayesian Optimization at HPC Scale

—Bayesian optimization (BO) is a widely used ap- proach for computationally expensive black-box optimization such as simulator calibration and hyperparameter optimiza- tion of deep learning methods. In BO, a dynamically updated computationally cheap surrogate model is employed to learn the input-output relationship of the black-box function; this surrogate model is used to explore and exploit the promising regions of the input space. Multipoint BO methods adopt a single manager/multiple workers strategy to achieve high-quality solutions in shorter time. However, the computational overhead in multipoint generation schemes is a major bottleneck in designing BO methods that can scale to thousands of workers. We present an asynchronous-distributed BO (ADBO) method wherein each worker runs a search and asynchronously communicates the input-output values of black-box evaluations from all other workers without the manager. We scale our method up to 4,096 workers and demonstrate improvement in the quality of the solution and faster convergence. We demonstrate the effectiveness of our approach for tuning the hyperparameters of neural networks from the Exascale computing project CANDLE benchmarks.

[1]  A. Obabko,et al.  Data-driven modeling of coarse mesh turbulence for reactor transient analysis using convolutional recurrent neural networks , 2021, Nuclear Engineering and Design.

[2]  A. Obabko,et al.  Machine Learning Assisted Safety Modeling and Analysis of Advanced Reactors , 2021 .

[3]  R. Maulik,et al.  AutoDEUQ: Automated Deep Ensemble with Uncertainty Quantification , 2021, 2022 26th International Conference on Pattern Recognition (ICPR).

[4]  Isabelle Guyon,et al.  AgEBO-Tabular: Joint Neural Architecture and Hyperparameter Search with Autotuned Data-Parallel Training for Tabular Data , 2020, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Fangfang Xia,et al.  A Systematic Approach to Featurization for Cancer Drug Sensitivity Predictions with Deep Learning , 2020, ArXiv.

[6]  Kevin G. Jamieson,et al.  A System for Massively Parallel Hyperparameter Tuning , 2018, MLSys.

[7]  Haitao Liu,et al.  When Gaussian Process Meets Big Data: A Review of Scalable GPs , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[8]  A. Belloum,et al.  The Landscape of Exascale Research: A Data-Driven Literature Analysis Heldens, , 2020 .

[9]  Takuya Akiba,et al.  Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.

[10]  Ruben Martinez-Cantin,et al.  Fully Distributed Bayesian Optimization with Stochastic Policies , 2019, IJCAI.

[11]  Stephen J. Roberts,et al.  Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation , 2019, ICML.

[12]  Prasanna Balaprakash,et al.  DeepHyper: Asynchronous Hyperparameter Search for Deep Neural Networks , 2018, 2018 IEEE 25th International Conference on High Performance Computing (HiPC).

[13]  Fangfang Xia,et al.  Predicting tumor cell line response to drug pairs with deep learning , 2018, BMC Bioinformatics.

[14]  Fangfang Xia,et al.  CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research , 2018, BMC Bioinformatics.

[15]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[16]  Frank Hutter,et al.  Maximizing acquisition functions for Bayesian optimization , 2018, NeurIPS.

[17]  Alán Aspuru-Guzik,et al.  Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space , 2017, ICML.

[18]  Bernd Bischl,et al.  mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions , 2017, 1703.03373.

[19]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[20]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[21]  Neil D. Lawrence,et al.  Batch Bayesian Optimization via Local Penalization , 2015, AISTATS.

[22]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[23]  T. Bartz-Beielstein A SURVEY OF MODEL-BASED METHODS FOR GLOBAL OPTIMIZATION , 2016 .

[24]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[25]  Kevin Leyton-Brown,et al.  An Efficient Approach for Assessing Hyperparameter Importance , 2014, ICML.

[26]  Stefan M. Wild,et al.  Derivative-free optimization for parameter estimation in computational nuclear physics , 2014, 1406.5464.

[27]  Kevin Leyton-Brown,et al.  Algorithm runtime prediction: Methods & evaluation , 2012, Artif. Intell..

[28]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[29]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[30]  Flagot Yohannes Derivative free optimization methods , 2012 .

[31]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[32]  Martin Pelikan,et al.  An introduction and survey of estimation of distribution algorithms , 2011, Swarm Evol. Comput..

[33]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[34]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[35]  D. Ginsbourger,et al.  Kriging is well-suited to parallelize optimization , 2010 .

[36]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[37]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[38]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[39]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[40]  Thomas Bäck,et al.  An Overview of Evolutionary Algorithms for Parameter Optimization , 1993, Evolutionary Computation.

[41]  Rob A. Rutenbar,et al.  Simulated annealing algorithms: an overview , 1989, IEEE Circuits and Devices Magazine.

[42]  L. S. Nelson,et al.  The Nelder-Mead Simplex Procedure for Function Minimization , 1975 .