Preferential Batch Bayesian Optimization

Most research in Bayesian optimization (BO) has focused on direct feedback scenarios, where one has access to exact values of some expensive-to-evaluate objective. This direction has been mainly driven by the use of BO in machine learning hyperparameter configuration problems. However, in domains such as modelling human preferences, A/B tests, or recommender systems, there is a need for methods that can replace direct feedback with preferential feedback, obtained via rankings or pairwise comparisons. In this work, we present preferential batch Bayesian optimization (PBBO), a new framework that allows finding the optimum of a latent function of interest, given any type of parallel preferential feedback for a group of two or more points. We do so by using a Gaussian process model with a likelihood specially designed to enable parallel and efficient data collection mechanisms, which are key in modern machine learning. We show how the acquisitions developed under this framework generalize and augment previous approaches in Bayesian optimization, expanding the use of these techniques to a wider range of domains. An extensive simulation study shows the benefits of this approach, both with simulated functions and four real data sets.

[1]  Eyke Hüllermeier,et al.  Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach , 2015, NIPS.

[2]  Michalis K. Titsias,et al.  One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities , 2016, NIPS.

[3]  Neil D. Lawrence,et al.  Preferential Bayesian Optimization , 2017, ICML.

[4]  Aki Vehtari,et al.  CORRECTING BOUNDARY OVER-EXPLORATION DEFICIENCIES IN BAYESIAN OPTIMIZATION WITH VIRTUAL DERIVATIVE SIGN OBSERVATIONS , 2017, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP).

[5]  Robert D. Nowak,et al.  Sparse Dueling Bandits , 2015, AISTATS.

[6]  Thorsten Joachims,et al.  Beat the Mean Bandit , 2011, ICML.

[7]  Daniel Hernández-Lobato,et al.  Scalable Multi-Class Gaussian Process Classification using Expectation Propagation , 2017, ICML.

[8]  Katja Hofmann,et al.  Contextual Dueling Bandits , 2015, COLT.

[9]  Marc G. Genton,et al.  On the exact distribution of the maximum of absolutely continuous dependent random variables , 2008 .

[10]  Alán Aspuru-Guzik,et al.  Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space , 2017, ICML.

[11]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[12]  Alfred Kobsa,et al.  The Adaptive Web, Methods and Strategies of Web Personalization , 2007, The Adaptive Web.

[13]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[14]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[15]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[16]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[17]  Wei Chu,et al.  Extensions of Gaussian Processes for Ranking : Semi-supervised and Active Learning , 2005 .

[18]  Huasen Wu,et al.  Double Thompson Sampling for Dueling Bandits , 2016, NIPS.

[19]  Eric Brochu,et al.  Interactive Bayesian optimization : learning user preferences for graphics and animation , 2010 .

[20]  A. Tversky,et al.  Prospect theory: an analysis of decision under risk — Source link , 2007 .

[21]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[22]  David Ginsbourger,et al.  Fast Computation of the Multi-Points Expected Improvement with Applications in Batch Selection , 2013, LION.

[23]  Wei Chu,et al.  Preference learning with Gaussian processes , 2005, ICML.

[24]  M. de Rijke,et al.  Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem , 2013, ICML.

[25]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[26]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[27]  I-Cheng Yeh,et al.  Modeling of strength of high-performance concrete using artificial neural networks , 1998 .

[28]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[29]  Ole Winther,et al.  Gaussian Processes for Classification: Mean-Field Algorithms , 2000, Neural Computation.

[30]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[31]  Ian Dewancker,et al.  A Stratified Analysis of Bayesian Optimization Methods , 2016, ArXiv.

[32]  Manfred Opper,et al.  The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[33]  M. de Rijke,et al.  Copeland Dueling Bandits , 2015, NIPS.

[34]  Thorsten Joachims,et al.  The K-armed Dueling Bandits Problem , 2012, COLT.

[35]  Neil D. Lawrence,et al.  Batch Bayesian Optimization via Local Penalization , 2015, AISTATS.