A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions

Abstract This tutorial introduces the reader to Gaussian process regression as an expressive tool to model, actively explore and exploit unknown functions. Gaussian process regression is a powerful, non-parametric Bayesian approach towards regression problems that can be utilized in exploration and exploitation scenarios. This tutorial aims to provide an accessible introduction to these techniques. We will introduce Gaussian processes which generate distributions over functions used for Bayesian non-parametric regression, and demonstrate their use in applications and didactic examples including simple regression problems, a demonstration of kernel-encoded prior assumptions and compositions, a pure exploration scenario within an optimal design framework, and a bandit-like exploration–exploitation scenario where the goal is to recommend movies. Beyond that, we describe a situation modelling risk-averse exploration in which an additional constraint (not to sample below a certain threshold) needs to be accounted for. Lastly, we summarize recent psychological experiments utilizing Gaussian processes. Software and literature pointers are also provided.

[1]  José Miguel Hernández-Lobato,et al.  Quantifying mismatch in Bayesian optimization , 2016, NIPS 2016.

[2]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[3]  Aki Vehtari,et al.  Fast hierarchical Gaussian processes , 2015 .

[4]  Joshua B. Tenenbaum,et al.  Assessing the Perceived Predictability of Functions , 2015, CogSci.

[5]  Carl E. Rasmussen,et al.  Gaussian Processes for Machine Learning (GPML) Toolbox , 2010, J. Mach. Learn. Res..

[6]  Michael A. Osborne,et al.  Probabilistic numerics and uncertainty in computations , 2015, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[7]  M. Kac,et al.  An Explicit Representation of a Stationary Gaussian Process , 1947 .

[8]  Aki Vehtari,et al.  GPstuff: Bayesian modeling with Gaussian processes , 2013, J. Mach. Learn. Res..

[9]  Jay I. Myung,et al.  Optimal experimental design for model discrimination. , 2009, Psychological review.

[10]  James T. Townsend,et al.  Designs for and Analyses of Response Time Experiments , 2013 .

[11]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[12]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[13]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[14]  Robert B. Gramacy,et al.  tgp: An R Package for Bayesian Nonstationary, Semiparametric Nonlinear Regression and Design by Treed Gaussian Process Models , 2007 .

[15]  Christopher K. I. Williams Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[16]  C.H. Lee A phase space spline smoother for fitting trajectories , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[17]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[18]  H. Akaike A new look at the statistical model identification , 1974 .

[19]  George Kachergis,et al.  Gaussian Process Regression for Trajectory Analysis , 2012, CogSci.

[20]  Alexander J. Smola,et al.  Regret Bounds for Deterministic Gaussian Process Bandits , 2012, ArXiv.

[21]  Michael N. Katehakis,et al.  The Multi-Armed Bandit Problem: Decomposition and Computation , 1987, Math. Oper. Res..

[22]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[23]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[24]  Andreas Krause,et al.  SFO: A Toolbox for Submodular Function Optimization , 2010, J. Mach. Learn. Res..

[25]  Jay I. Myung,et al.  On the functional form of temporal discounting: An optimized adaptive test , 2016, Journal of risk and uncertainty.

[26]  Pushmeet Kohli,et al.  Tractability: Practical Approaches to Hard Problems , 2013 .

[27]  David S. Leslie,et al.  Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..

[28]  Joshua B. Tenenbaum,et al.  Automatic Construction and Natural-Language Description of Nonparametric Regression Models , 2014, AAAI.

[29]  Joel W. Burdick,et al.  An Active Learning Algorithm for Control of Epidural Electrostimulation , 2015, IEEE Transactions on Biomedical Engineering.

[30]  Ralf Engbert,et al.  Microsaccades Keep the Eyes' Balance During Fixation , 2004, Psychological science.

[31]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[32]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[33]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[34]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[35]  Felix Henninger,et al.  Mousetrap: An integrated, open-source mouse-tracking package , 2017, Behavior Research Methods.

[36]  Joshua B. Tenenbaum,et al.  Probing the Compositionality of Intuitive Functions , 2016, NIPS.

[37]  Andreas Krause,et al.  Better safe than sorry: Risky function exploitation through safe optimization , 2016, CogSci.

[38]  Bernhard Schölkopf,et al.  A tutorial on kernel methods for categorization , 2007, Journal of Mathematical Psychology.

[39]  R. Simon,et al.  Flexible regression models with cubic splines. , 1989, Statistics in medicine.

[40]  Samuel J. Gershman,et al.  A Tutorial on Bayesian Nonparametric Models , 2011, 1106.2697.

[41]  Jonathan D. Nelson,et al.  Information search with situation-specific reward functions , 2012, Judgment and Decision Making.

[42]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[43]  M. Speekenbrink,et al.  Putting bandits into context: How function learning supports decision making , 2016, bioRxiv.

[44]  Ali Borji,et al.  Bayesian optimization explains human active search , 2013, NIPS.

[45]  Jonathan D. Nelson,et al.  Exploration and generalization in vast spaces 1 , 2017 .

[46]  E. Wagenmakers,et al.  Bayesian parameter estimation in the Expectancy Valence model of the Iowa gambling task , 2010 .

[47]  Jonathan B Freeman,et al.  MouseTracker: Software for studying real-time mental processing using a computer mouse-tracking method , 2010, Behavior research methods.

[48]  Andrew Gordon Wilson,et al.  The Human Kernel , 2015, NIPS.

[49]  Daniel W. Apley,et al.  Local Gaussian Process Approximation for Large Computer Experiments , 2013, 1303.0383.

[50]  Andreas Krause,et al.  Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics , 2016, Machine Learning.

[51]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[52]  R. Ratcliff,et al.  Estimation and interpretation of 1/fα noise in human cognition , 2004 .

[53]  Samuel J. Gershman,et al.  Structured Representations of Utility in Combinatorial Domains , 2017 .

[54]  Jay I. Myung,et al.  A Tutorial on Adaptive Design Optimization. , 2013, Journal of mathematical psychology.

[55]  Christopher G. Lucas,et al.  A rational model of function learning , 2015, Psychonomic Bulletin & Review.

[56]  Joshua B. Tenenbaum,et al.  Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[57]  Robert B. Gramacy,et al.  Bayesian treed gaussian process models , 2005 .