Improving GP-UCB Algorithm by Harnessing Decomposed Feedback

Gaussian processes (GPs) have been widely applied to machine learning and nonparametric approximation. Given existing observations, a GP allows the decision maker to update a posterior belief over the unknown underlying function. Usually, observations from a complex system come with noise and decomposed feedback from intermediate layers. For example, the decomposed feedback could be the components that constitute the final objective value, or the various feedback gotten from sensors. Previous literature has shown that GPs can successfully deal with noise, but has neglected decomposed feedback. We therefore propose a decomposed GP regression algorithm to incorporate this feedback, leading to less average root-mean-squared error with respect to the target function, especially when the samples are scarce. We also introduce a decomposed GP-UCB algorithm to solve the resulting bandit problem with decomposed feedback. We prove that our algorithm converges to the optimal solution and preserves the no-regret property. To demonstrate the wide applicability of this work, we execute our algorithm on two disparate social problems: infectious disease control and weather monitoring. The numerical results show that our method provides significant improvement against previous methods that do not utilize these feedback, showcasing the advantage of considering decomposed feedback.

[1]  Vianney Perchet,et al.  Gaussian Process Optimization with Mutual Information , 2013, ICML.

[2]  Gergely Neu,et al.  An Efficient Algorithm for Learning with Semi-bandit Feedback , 2013, ALT.

[3]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[4]  G. Laschewski,et al.  The perceived temperature – a versatile index for the assessment of the human thermal environment. Part A: scientific basics , 2011, International Journal of Biometeorology.

[5]  Milind Tambe,et al.  Preventing Infectious Disease in Dynamic Populations Under Uncertainty , 2018, AAAI.

[6]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[7]  Nakul Chitnis,et al.  Mathematical models of contact patterns between age groups for predicting the spread of infectious diseases. , 2013, Mathematical biosciences and engineering : MBE.

[8]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[9]  W John Edmunds,et al.  Estimating the impact of childhood influenza vaccination programmes in England and Wales. , 2008, Vaccine.

[10]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[11]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[12]  Jan Medlock,et al.  Optimizing the impact of low-efficacy influenza vaccines , 2018, Proceedings of the National Academy of Sciences.

[13]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[14]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[15]  Kirthevasan Kandasamy,et al.  High Dimensional Bayesian Optimisation and Bandits via Additive Models , 2015, ICML.

[16]  M. van Boven,et al.  Variation in loss of immunity shapes influenza epidemics and the impact of vaccination , 2017, BMC Infectious Diseases.

[17]  Bolei Zhou,et al.  Optimization as Estimation with Gaussian Processes in Bandit Settings , 2015, AISTATS.

[18]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.