论文信息 - Bayesian Kernel Methods

Bayesian Kernel Methods

Bayesian methods allow for a simple and intuitive representation of the function spaces used by kernel methods. This chapter describes the basic principles of Gaussian Processes, their implementation and their connection to other kernel-based Bayesian estimation methods, such as the Relevance Vector Machine.

Bernhard Schölkopf | Alexander J. Smola | B. Schölkopf | Alex Smola | B. Scholkopf

[1] L. Milne‐Thomson. A Treatise on the Theory of Bessel Functions , 1945, Nature.

[2] H. Cramér. Mathematical Methods of Statistics (PMS-9), Volume 9 , 1946 .

[3] O. Mangasarian. Linear and Nonlinear Separation of Patterns by Linear Programming , 1965 .

[4] P. J. Huber. The 1972 Wald Lecture Robust Statistics: A Review , 1972 .

[5] E. Polak. Introduction to linear and nonlinear programming , 1973 .

[6] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7] Steven A. Orszag,et al. CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS , 1978 .

[8] Philip E. Gill,et al. Practical optimization , 1981 .

[9] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[10] S. Duane,et al. Hybrid Monte Carlo , 1987 .

[11] R. Fletcher. Practical Methods of Optimization , 1988 .

[12] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[13] G. Wahba. Spline models for observational data , 1990 .

[14] David J. Spiegelhalter,et al. Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[15] F. Girosi. Models of Noise and Robust Estimates , 1991 .

[16] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[17] F. Girosi. Models of Noise and Robust Estimation , 1991 .

[18] O. Mangasarian,et al. Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[19] David J. C. MacKay,et al. The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[20] C. Stein,et al. Estimation with Quadratic Loss , 1992 .

[21] Stéphane Mallat,et al. Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[22] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[23] Carl E. Rasmussen,et al. In Advances in Neural Information Processing Systems , 2011 .

[24] Yoshua Bengio,et al. Pattern Recognition and Neural Networks , 1995 .

[25] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .

[26] Balas K. Natarajan,et al. Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[27] Radford M. Neal. Priors for Infinite Networks , 1996 .

[28] Alexander J. Smola,et al. Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[29] Hua Lee,et al. Maximum Entropy and Bayesian Methods. , 1996 .

[30] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[31] H. Luetkepohl. The Handbook of Matrices , 1996 .

[32] Michael I. Jordan,et al. Computing upper and lower bounds on likelihoods in intractable networks , 1996, UAI.

[33] Alexander J. Smola,et al. Neural Information Processing Systems , 1997, NIPS 1997.

[34] Bernhard Schölkopf,et al. Kernel Principal Component Analysis , 1997, ICANN.

[35] Pal Rujan,et al. Playing Billiards in Version Space , 1997, Neural Computation.

[36] Geoffrey E. Hinton,et al. Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[37] M. Gibbs,et al. Efficient implementation of gaussian processes , 1997 .

[38] Terrence J. Sejnowski,et al. Learning Nonlinear Overcomplete Representations for Efficient Coding , 1997, NIPS.

[39] Ole Winther,et al. Mean Field Methods for Classification with Gaussian Processes , 1998, NIPS.

[40] Paul S. Bradley,et al. Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[41] Christopher K. I. Williams. Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[42] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[43] Alexander J. Smola,et al. Learning with kernels , 1998 .

[44] Klaus Obermayer,et al. Classi cation on Pairwise Proximity , 2007 .

[45] David Haussler,et al. Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[46] David Barber,et al. Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[47] M. Seeger. Bayesian methods for Support Vector machines and Gaussian processes , 1999 .

[48] David Haussler,et al. Convolution kernels on discrete structures , 1999 .

[49] Ralf Herbrich,et al. Bayes Point Machines: Estimating the Bayes Point in Kernel Space , 1999 .

[50] C. Watkins. Dynamic Alignment Kernels , 1999 .

[51] Tommi S. Jaakkola,et al. Maximum Entropy Discrimination , 1999, NIPS.

[52] B. Schölkopf,et al. Linear programs for automatic accuracy control in regression. , 1999 .

[53] Gunnar Rätsch,et al. Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[54] John Shawe-Taylor,et al. A Column Generation Algorithm For Boosting , 2000, ICML.

[55] Christopher M. Bishop,et al. Variational Relevance Vector Machines , 2000, UAI.

[56] Manfred Opper,et al. Sparse Representation for Gaussian Process Models , 2000, NIPS.

[57] Glenn Fung,et al. Data selection for support vector machine classifiers , 2000, KDD '00.

[58] Bernhard Schölkopf,et al. Computing the Bayes Kernel Classifier , 2000 .

[59] David J. C. MacKay,et al. Variational Gaussian process classifiers , 2000, IEEE Trans. Neural Networks Learn. Syst..

[60] Volker Tresp,et al. A Bayesian Committee Machine , 2000, Neural Computation.

[61] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[62] Alexander J. Smola,et al. Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[63] B. Schölkopf,et al. Sparse Greedy Matrix Approximation for Machine Learning , 2000, ICML.

[64] Christopher K. I. Williams,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[65] P. Bartlett,et al. Probabilities for SV Machines , 2000 .

[66] Bernhard Schölkopf,et al. Dynamic Alignment Kernels , 2000 .

[67] P. Bartlett,et al. Gaussian Processes and SVM: Mean Field and Leave-One-Out , 2000 .

[68] Tommi S. Jaakkola,et al. Feature Selection and Dualities in Maximum Entropy Discrimination , 2000, UAI.

[69] Katya Scheinberg,et al. Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[70] Tong Zhang,et al. Some Sparse Approximation Bounds for Regression Problems , 2001, International Conference on Machine Learning.

[71] George Eastman House,et al. Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[72] Arnold Neumaier,et al. Introduction to Numerical Analysis , 2001 .

[73] Pedro Larrañaga,et al. An Introduction to Probabilistic Graphical Models , 2002, Estimation of Distribution Algorithms.

[74] Ralf Herbrich,et al. Learning Kernel Classifiers: Theory and Algorithms , 2001 .

[75] Gunnar Rätsch,et al. Adapting Codes and Embeddings for Polychotomies , 2002, NIPS.

[76] Mario Marchand,et al. Computing the Bayes Kernel Classifler , 2004 .

[77] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[78] T. Poggio,et al. On optimal nonlinear associative recall , 1975, Biological Cybernetics.

[79] Katya Scheinberg,et al. A product-form Cholesky factorization method for handling dense columns in interior point methods for linear programming , 2004, Math. Program..