Bayesian Kernel Methods

Bayesian methods allow for a simple and intuitive representation of the function spaces used by kernel methods. This chapter describes the basic principles of Gaussian Processes, their implementation and their connection to other kernel-based Bayesian estimation methods, such as the Relevance Vector Machine.

[1]  L. Milne‐Thomson A Treatise on the Theory of Bessel Functions , 1945, Nature.

[2]  H. Cramér Mathematical Methods of Statistics (PMS-9), Volume 9 , 1946 .

[3]  O. Mangasarian Linear and Nonlinear Separation of Patterns by Linear Programming , 1965 .

[4]  P. J. Huber The 1972 Wald Lecture Robust Statistics: A Review , 1972 .

[5]  E. Polak Introduction to linear and nonlinear programming , 1973 .

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Steven A. Orszag,et al.  CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS , 1978 .

[8]  Philip E. Gill,et al.  Practical optimization , 1981 .

[9]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[10]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[11]  R. Fletcher Practical Methods of Optimization , 1988 .

[12]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[13]  G. Wahba Spline models for observational data , 1990 .

[14]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[15]  F. Girosi Models of Noise and Robust Estimates , 1991 .

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  F. Girosi Models of Noise and Robust Estimation , 1991 .

[18]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[19]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[20]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[21]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[22]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[23]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[24]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[25]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[26]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[27]  Radford M. Neal Priors for Infinite Networks , 1996 .

[28]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[29]  Hua Lee,et al.  Maximum Entropy and Bayesian Methods. , 1996 .

[30]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[31]  H. Luetkepohl The Handbook of Matrices , 1996 .

[32]  Michael I. Jordan,et al.  Computing upper and lower bounds on likelihoods in intractable networks , 1996, UAI.

[33]  Alexander J. Smola,et al.  Neural Information Processing Systems , 1997, NIPS 1997.

[34]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[35]  Pal Rujan,et al.  Playing Billiards in Version Space , 1997, Neural Computation.

[36]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[37]  M. Gibbs,et al.  Efficient implementation of gaussian processes , 1997 .

[38]  Terrence J. Sejnowski,et al.  Learning Nonlinear Overcomplete Representations for Efficient Coding , 1997, NIPS.

[39]  Ole Winther,et al.  Mean Field Methods for Classification with Gaussian Processes , 1998, NIPS.

[40]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[41]  Christopher K. I. Williams Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[42]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[43]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[44]  Klaus Obermayer,et al.  Classi cation on Pairwise Proximity , 2007 .

[45]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[46]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  M. Seeger Bayesian methods for Support Vector machines and Gaussian processes , 1999 .

[48]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[49]  Ralf Herbrich,et al.  Bayes Point Machines: Estimating the Bayes Point in Kernel Space , 1999 .

[50]  C. Watkins Dynamic Alignment Kernels , 1999 .

[51]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[52]  B. Schölkopf,et al.  Linear programs for automatic accuracy control in regression. , 1999 .

[53]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[54]  John Shawe-Taylor,et al.  A Column Generation Algorithm For Boosting , 2000, ICML.

[55]  Christopher M. Bishop,et al.  Variational Relevance Vector Machines , 2000, UAI.

[56]  Manfred Opper,et al.  Sparse Representation for Gaussian Process Models , 2000, NIPS.

[57]  Glenn Fung,et al.  Data selection for support vector machine classifiers , 2000, KDD '00.

[58]  Bernhard Schölkopf,et al.  Computing the Bayes Kernel Classifier , 2000 .

[59]  David J. C. MacKay,et al.  Variational Gaussian process classifiers , 2000, IEEE Trans. Neural Networks Learn. Syst..

[60]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[61]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[62]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[63]  B. Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, ICML.

[64]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[65]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[66]  Bernhard Schölkopf,et al.  Dynamic Alignment Kernels , 2000 .

[67]  P. Bartlett,et al.  Gaussian Processes and SVM: Mean Field and Leave-One-Out , 2000 .

[68]  Tommi S. Jaakkola,et al.  Feature Selection and Dualities in Maximum Entropy Discrimination , 2000, UAI.

[69]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[70]  Tong Zhang,et al.  Some Sparse Approximation Bounds for Regression Problems , 2001, International Conference on Machine Learning.

[71]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[72]  Arnold Neumaier,et al.  Introduction to Numerical Analysis , 2001 .

[73]  Pedro Larrañaga,et al.  An Introduction to Probabilistic Graphical Models , 2002, Estimation of Distribution Algorithms.

[74]  Ralf Herbrich,et al.  Learning Kernel Classifiers: Theory and Algorithms , 2001 .

[75]  Gunnar Rätsch,et al.  Adapting Codes and Embeddings for Polychotomies , 2002, NIPS.

[76]  Mario Marchand,et al.  Computing the Bayes Kernel Classifler , 2004 .

[77]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[78]  T. Poggio,et al.  On optimal nonlinear associative recall , 1975, Biological Cybernetics.

[79]  Katya Scheinberg,et al.  A product-form Cholesky factorization method for handling dense columns in interior point methods for linear programming , 2004, Math. Program..