Gaussian Processes for Regression and Optimisation

Gaussian processes have proved to be useful and powerful constructs for the purposes of regression. The classical method proceeds by parameterising a covariance function, and then infers the parameters given the training data. In this thesis, the classical approach is augmented by interpreting Gaussian processes as the outputs of linear filters excited by white noise. This enables a straightforward definition of dependent Gaussian processes as the outputs of a multiple output linear filter excited by multiple noise sources. We show how dependent Gaussian processes defined in this way can also be used for the purposes of system identification. Onewell known problemwith Gaussian process regression is that the computational complexity scales poorly with the amount of training data. We review one approximate solution that alleviates this problem, namely reduced rank Gaussian processes. We then show how the reduced rank approximation can be applied to allow for the efficient computation of dependent Gaussian processes. We then examine the application of Gaussian processes to the solution of other machine learning problems. To do so, we review methods for the parameterisation of full covariance matrices. Furthermore, we discuss how improvements can be made by marginalising over alternative models, and introduce methods to perform these computations efficiently. In particular, we introduce sequential annealed importance sampling as a method for calculating model evidence in an on-line fashion as new data arrives. Gaussian process regression can also be applied to optimisation. An algorithm is described that uses model comparison between multiple models to find the optimum of a function while taking as few samples as possible. This algorithm shows impressive performance on the standard control problem of double pole balancing. Finally, we describe how Gaussian processes can be used to efficiently estimate gradients of noisy functions, and numerically estimate integrals.

[1]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[2]  Matthias W. Seeger,et al.  Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[3]  David J. C. MacKay,et al.  Bayesian Methods for Backpropagation Networks , 1996 .

[4]  Quantitative Methods for Current Environmental Issues , 2005 .

[5]  Herbert K. H. Lee,et al.  Efficient models for correlated data via convolutions of intrinsic processes , 2005 .

[6]  Radford M. Neal Bayesian training of backpropagation networks by the hybrid Monte-Carlo method , 1992 .

[7]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[8]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[9]  Trevor J. Terrell Introduction to Digital Filters , 1980 .

[10]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[11]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[12]  John Kronenburger,et al.  Analog & Digital Signal Processing , 2007 .

[13]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[14]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[15]  Emmanuel Ifeachor,et al.  Digital Signal Processing: A Practical Approach , 1993 .

[16]  H. Jeffreys,et al.  The Theory of Probability , 1896 .

[17]  W. F. Trench An Algorithm for the Inversion of Finite Toeplitz Matrices , 1964 .

[18]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[19]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[20]  Douglas C. Montgomery,et al.  Response Surface Methodology: Process and Product Optimization Using Designed Experiments , 1995 .

[21]  Marcus R. Frean,et al.  Dependent Gaussian Processes , 2004, NIPS.

[22]  Radford M. Neal,et al.  Bayesian Learning for Neural Networks (Lecture Notes in Statistical Vol. 118) , 1997 .

[23]  Robert Haining,et al.  Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95 , 1993 .

[24]  A. Storkey Truncated covariance matrices and Toeplitz methods in Gaussian processes , 1999 .

[25]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[26]  Catherine A. Calder Efficient Posterior Inference and Prediction of Space-Time Processes Using Dynamic Process Convolutions , 2004 .

[27]  Helmut Ltkepohl,et al.  New Introduction to Multiple Time Series Analysis , 2007 .

[28]  Jouko Lampinen,et al.  Bayesian approach for neural networks--review and case studies , 2001, Neural Networks.

[29]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[30]  G. Matheron Principles of geostatistics , 1963 .

[31]  H. Katzgraber Introduction to Monte Carlo Methods , 2009, 0905.1629.

[32]  G. Alistair Watson,et al.  An Algorithm for the Inversion of Block Matrices of Toeplitz Form , 1973, JACM.

[33]  David Barber,et al.  Gaussian Processes for Bayesian Classification via Hybrid Monte Carlo , 1996, NIPS.

[34]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[35]  L. Csató Gaussian processes:iterative sparse approximations , 2002 .

[36]  E. W. Weisstein,et al.  Moore-Penrose matrix inverse , 2004 .

[37]  R. H. Myers,et al.  Response Surface Methodology: Process and Product Optimization Using Designed Experiments , 1995 .

[38]  Amos Storkey,et al.  Efficient Covariance Matrix Methods for Bayesian Gaussian Processes and Hopfield Neural Networks , 1999 .

[39]  Larry D. Pyeatt,et al.  A comparison between cellular encoding and direct encoding for genetic neural networks , 1996 .

[40]  Gene H. Golub,et al.  Matrix computations , 1983 .

[41]  Mark J. Schervish,et al.  Nonstationary Covariance Functions for Gaussian Process Regression , 2003, NIPS.

[42]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[43]  Gene F. Franklin,et al.  Feedback Control of Dynamic Systems , 1986 .

[44]  Manfred Opper,et al.  Sparse Representation for Gaussian Process Models , 2000, NIPS.

[45]  P. K. Chaturvedi,et al.  Communication Systems , 2002, IFIP — The International Federation for Information Processing.

[46]  A. P. Dawid,et al.  Gaussian Processes to Speed up Hybrid Monte Carlo for Expensive Bayesian Integrals , 2003 .

[47]  D. Mackay,et al.  A Practical Bayesian Framework for Backprop Networks , 1991 .

[48]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[49]  P. Kumar,et al.  Theory and practice of recursive identification , 1985, IEEE Transactions on Automatic Control.

[50]  Carl Edward Rasmussen,et al.  Observations on the Nyström Method for Gaussian Process Prediction , 2002 .

[51]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[52]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[53]  Hussein Baher,et al.  Analog & digital signal processing , 1990 .

[54]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[55]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[56]  Iain Murray Introduction To Gaussian Processes , 2008 .

[57]  D. Higdon Space and Space-Time Modeling using Process Convolutions , 2002 .

[58]  David J. C. MacKay,et al.  Variational Gaussian process classifiers , 2000, IEEE Trans. Neural Networks Learn. Syst..

[59]  Raymond H. Chan,et al.  Conjugate Gradient Methods for Toeplitz Systems , 1996, SIAM Rev..

[60]  Paul W. Goldberg,et al.  Regression with Input-dependent Noise: A Gaussian Process Treatment , 1997, NIPS.

[61]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[62]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[63]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[64]  Christopher K. I. Williams,et al.  Discovering Hidden Features with Gaussian Processes Regression , 1998, NIPS.

[65]  Volker Tresp,et al.  Mixtures of Gaussian Processes , 2000, NIPS.

[66]  Carl E. Rasmussen,et al.  Warped Gaussian Processes , 2003, NIPS.

[67]  Risto Miikkulainen,et al.  Efficient Reinforcement Learning Through Evolving Neural Network Topologies , 2002, GECCO.

[68]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[69]  Carl E. Rasmussen,et al.  Derivative Observations in Gaussian Process Models of Dynamic Systems , 2002, NIPS.

[70]  David Mackay,et al.  Gaussian Processes - A Replacement for Supervised Neural Networks? , 1997 .

[71]  A. O'Hagan,et al.  Curve Fitting and Optimal Design for Prediction , 1978 .

[72]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[73]  L. M. M.-T. Theory of Probability , 1929, Nature.

[74]  Lennart Ljung From data to model: a guided tour , 1994 .

[75]  Petros Koumoutsakos,et al.  Accelerating evolutionary algorithms with Gaussian process fitness function models , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[76]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[77]  Risto Miikkulainen,et al.  Efficient evolution of neural networks through complexification , 2004 .

[78]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[79]  B. Silverman,et al.  Some Aspects of the Spline Smoothing Approach to Non‐Parametric Regression Curve Fitting , 1985 .

[80]  Fuzhen Zhang The Schur complement and its applications , 2005 .

[81]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[82]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[83]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[84]  Carl E. Rasmussen,et al.  Analysis of Some Methods for Reduced Rank Gaussian Process Regression , 2003, European Summer School on Multi-AgentControl.

[85]  M. Gibbs,et al.  Efficient implementation of gaussian processes , 1997 .

[86]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[87]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[88]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[89]  Christopher J. Paciorek,et al.  Nonstationary Gaussian Processes for Regression and Spatial Modelling , 2003 .