Sparsifying to optimize over multiple information sources: an augmented Gaussian process based algorithm

Optimizing a black-box, expensive, and multi-extremal function, given multiple approximations, is a challenging task known as multi-information source optimization (MISO), where each source has a different cost and the level of approximation (aka fidelity) of each source can change over the search space. While most of the current approaches fuse the Gaussian processes (GPs) modelling each source, we propose to use GP sparsification to select only “reliable” function evaluations performed over all the sources. These selected evaluations are used to create an augmented Gaussian process (AGP), whose name is implied by the fact that the evaluations on the most expensive source are augmented with the reliable evaluations over less expensive sources. A new acquisition function, based on confidence bound, is also proposed, including both cost of the next source to query and the location-dependent approximation of that source. This approximation is estimated through a model discrepancy measure and the prediction uncertainty of the GPs. MISO-AGP and the MISO-fused GP counterpart are compared on two test problems and hyperparameter optimization of a machine learning classifier on a large dataset.

[1]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[2]  Norman H. Princen,et al.  Multifidelity Data Fusion: Application to Blended-Wing-Body Multidisciplinary Analysis Under Uncertainty , 2020 .

[3]  Marc Peter Deisenroth,et al.  Efficiently sampling functions from Gaussian process posteriors , 2020, ICML.

[4]  Max M. J. Opgenoord,et al.  Multifidelity Method for Locating Aeroelastic Flutter Boundaries , 2019 .

[5]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[6]  Karen Willcox,et al.  Reusing Information for Multifidelity Active Learning in Reliability-Based Design Optimization , 2019, AIAA Scitech 2019 Forum.

[7]  CsatóLehel,et al.  Sparse on-line Gaussian processes , 2002 .

[8]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[9]  Zi Wang,et al.  Batched Large-scale Bayesian Optimization in High-dimensional Spaces , 2017, AISTATS.

[10]  Karen Willcox,et al.  Multifidelity Optimization using Statistical Surrogate Modeling for Non-Hierarchical Information Sources , 2015 .

[11]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[12]  Douglas Allaire,et al.  Multi-information source constrained Bayesian optimization , 2018, Structural and Multidisciplinary Optimization.

[13]  Benjamin Peherstorfer,et al.  Combining multiple surrogate models to accelerate failure probability estimation with expensive high-fidelity models , 2017, J. Comput. Phys..

[14]  Eesha Goel,et al.  Random Forest: A Review , 2017 .

[15]  R. L. Winkler Combining Probability Distributions from Dependent Information Sources , 1981 .

[16]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[17]  Marc Toussaint,et al.  Efficient sparsification for Gaussian process regression , 2016, Neurocomputing.

[18]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[19]  Antonio Candelieri,et al.  Bayesian Optimization and Data Science , 2019, SpringerBriefs in Optimization.

[20]  G. Wahba Spline models for observational data , 1990 .

[21]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[22]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[23]  Kirthevasan Kandasamy,et al.  Multi-fidelity Gaussian Process Bandit Optimisation , 2016, J. Artif. Intell. Res..

[24]  Lars Kotthoff,et al.  Automated Machine Learning: Methods, Systems, Challenges , 2019, The Springer Series on Challenges in Machine Learning.

[25]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[26]  Alexander I. J. Forrester,et al.  Multi-fidelity optimization via surrogate modelling , 2007, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[27]  Florian Steinke,et al.  Bayesian Inference and Optimal Design in the Sparse Linear Model , 2007, AISTATS.

[28]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[29]  Matthias Poloczek,et al.  Multi-Information Source Optimization , 2016, NIPS.

[30]  Diego G. Loyola,et al.  Smart sampling and incremental function learning for very large high dimensional data , 2016, Neural Networks.

[31]  J. Knapp,et al.  CORSIKA: A Monte Carlo code to simulate extensive air showers , 1998 .

[32]  Thomas Bartz-Beielstein,et al.  Uncertainty Management Using Sequential Parameter Optimization , 2015 .

[33]  A. O'Hagan,et al.  Predicting the output from a complex computer code when fast approximations are available , 2000 .

[34]  Karen Willcox,et al.  Provably Convergent Multifidelity Optimization Algorithm Not Requiring High-Fidelity Derivatives , 2012 .

[35]  S. Kakade,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2012, IEEE Transactions on Information Theory.