论文信息 - Kernel methods in machine learning

Kernel methods in machine learning

We review machine learning methods employing positive definite kernels. These methods formulate learning and estimation problems in a reproducing kernel Hilbert space (RKHS) of functions defined on the data domain, expanded in terms of a kernel. Working in linear spaces of function has the benefit of facilitating the construction and analysis of learning algorithms while at the same time allowing large classes of functions. The latter include nonlinear functions as well as functions defined on nonvectorial data. We cover a wide range of methods, ranging from binary classifiers to sophisticated methods for estimation with structured data.

Alex Smola | Thomas Hofmann | B. Scholkopf

[1] J. Mercer. Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[2] S. Bochner. Monotone Funktionen, Stieltjessche Integrale und harmonische Analyse , 1933 .

[3] H. Hotelling. Relations Between Two Sets of Variates , 1936 .

[4] I. J. Schoenberg. Metric spaces and completely monotone functions , 1938 .

[5] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .

[6] R. Fortet,et al. Convergence de la répartition empirique vers la répartition théorique , 1953 .

[7] Walter W Garvin,et al. Introduction to Linear Programming , 2018, Linear Programming and Resource Allocation Modeling.

[8] A Tikhonov,et al. Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[9] V. Vapnik. Pattern recognition using generalized portrait method , 1963 .

[10] M. Aizerman,et al. Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[11] O. Mangasarian. Linear and Nonlinear Separation of Patterns by Linear Programming , 1965 .

[12] Marvin Minsky,et al. Perceptrons: An Introduction to Computational Geometry , 1969 .

[13] E. Parzen. STATISTICAL INFERENCE ON TIME SERIES BY RKHS METHODS. , 1970 .

[14] J. Kettenring,et al. Canonical Analysis of Several Sets of Variables , 2022 .

[15] G. Wahba,et al. Some results on Tchebycheffian spline functions , 1971 .

[16] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[17] S. R. Searle. Linear Models , 1971 .

[18] J. M. Hammersley,et al. Markov fields on finite graphs and lattices , 1971 .

[19] J. Darroch,et al. Generalized Iterative Scaling for Log-Linear Models , 1972 .

[20] L. Baum,et al. An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[21] M. Fiedler. Algebraic connectivity of graphs , 1973 .

[22] John W. Tukey,et al. A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[23] D. Bamber. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[24] J. Stewart. Positive definite functions and generalizations, an historical survey , 1976 .

[25] J. F. C. Kingman,et al. Information and Exponential Families in Statistical Theory , 1980 .

[26] M. Loève,et al. Probability Theory II (4th ed.). , 1979 .

[27] W. Steiger,et al. Least Absolute Deviations: Theory, Applications and Algorithms , 1984 .

[28] V. A. Morozov,et al. Methods for Solving Incorrectly Posed Problems , 1984 .

[29] C. Berg,et al. Harmonic Analysis on Semigroups , 1984 .

[30] B. Yandell,et al. Semi-Parametric Generalized Linear Models. , 1985 .

[31] B. Yandell,et al. Automatic Smoothing of Regression Functions in Generalized Linear Models , 1986 .

[32] Robin Sibson,et al. What is projection pursuit , 1987 .

[33] R. Fletcher. Practical Methods of Optimization , 1988 .

[34] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[35] F. Girosi,et al. Networks for approximation and learning , 1990, Proc. IEEE.

[36] G. Wahba. Spline models for observational data , 1990 .

[37] Steffen L. Lauritzen,et al. Bayesian updating in causal probabilistic networks by local computations , 1990 .

[38] D. Mason,et al. Generalized quantile processes , 1992 .

[39] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[40] O. Mangasarian,et al. Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[41] A. P. Dawid,et al. Applications of a general propagation algorithm for probabilistic expert systems , 1992 .

[42] M. Murray,et al. Differential Geometry and Statistics , 1993 .

[43] Kenneth O. Kortanek,et al. Semi-Infinite Programming: Theory, Methods, and Applications , 1993, SIAM Rev..

[44] A. Buja,et al. Projection Pursuit Indexes Based on Orthonormal Function Expansions , 1993 .

[45] P. Sen,et al. Restricted canonical correlations , 1994 .

[46] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[47] S. Klinke,et al. Exploratory Projection Pursuit , 1995 .

[48] C. Micchelli,et al. Functions that preserve families of positive semidefinite matrices , 1995 .

[49] G. Wahba,et al. Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy : the 1994 Neyman Memorial Lecture , 1995 .

[50] Alexander J. Smola,et al. Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[51] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[52] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[53] David M. Magerman,et al. Learning grammatical stucture using statistical decision-trees , 1996, ICGI.

[54] Bernhard Schölkopf,et al. Support vector learning , 1997 .

[55] Shun-ichi Amari,et al. Adaptive Online Learning Algorithms for Blind Separation: Maximum Entropy and Minimum Mutual Information , 1997, Neural Computation.

[56] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[57] Noga Alon,et al. Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[58] Bernhard Schölkopf,et al. On a Kernel-Based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion , 1998, Algorithmica.

[59] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[60] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[61] T. Ens,et al. Blind signal separation : statistical principles , 1998 .

[62] J. C. BurgesChristopher. A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[63] Bernhard Schölkopf,et al. The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[64] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[65] J. Dauxois,et al. Nonlinear canonical analysis and independence tests , 1998 .

[66] Alexander J. Smola,et al. Learning with kernels , 1998 .

[67] A. J. Bell,et al. A Unifying Information-Theoretic Framework for Independent Component Analysis , 2000 .

[68] J. Weston,et al. Support vector regression with ANOVA decomposition kernels , 1999 .

[69] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[70] David Haussler,et al. Convolution kernels on discrete structures , 1999 .

[71] C. Watkins. Dynamic Alignment Kernels , 1999 .

[72] Gunnar Rätsch,et al. Engineering Support Vector Machine Kerneis That Recognize Translation Initialion Sites , 2000, German Conference on Bioinformatics.

[73] David Haussler,et al. Probabilistic kernel regression models , 1999, AISTATS.

[74] John Shawe-Taylor,et al. A Column Generation Algorithm For Boosting , 2000, ICML.

[75] Arthur E. Hoerl,et al. Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[76] Nello Cristianini,et al. An introduction to Support Vector Machines , 2000 .

[77] Ralf Herbrich,et al. Large margin rank boundaries for ordinal regression , 2000 .

[78] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[79] Bernhard Schölkopf,et al. New Support Vector Algorithms , 2000, Neural Computation.

[80] Thore Graepel,et al. Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[81] Alexander J. Smola,et al. Advances in Large Margin Classifiers , 2000 .

[82] Yoram Singer,et al. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[83] Bernhard Schölkopf,et al. Dynamic Alignment Kernels , 2000 .

[84] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[85] Jason Weston,et al. A kernel method for multi-labelled classification , 2001, NIPS.

[86] Vladimir Koltchinskii,et al. Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[87] N. Cristianini,et al. On Kernel-Target Alignment , 2001, NIPS.

[88] Bernhard Schölkopf,et al. Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[89] Ingo Steinwart,et al. On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[90] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[91] Michael Collins,et al. Convolution Kernels for Natural Language , 2001, NIPS.

[92] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[93] Ralf Herbrich,et al. Learning Kernel Classifiers: Theory and Algorithms , 2001 .

[94] Thorsten Joachims,et al. Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[95] Eric R. Ziegel,et al. Generalized Linear Models , 2002, Technometrics.

[96] Bernhard Schölkopf,et al. Kernel Dependency Estimation , 2002, NIPS.

[97] Eleazar Eskin,et al. The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[98] Ingo Steinwart,et al. Support Vector Machines are Universally Consistent , 2002, J. Complex..

[99] Alexander J. Smola,et al. Fast Kernels for String and Tree Matching , 2002, NIPS.

[100] Shahar Mendelson,et al. A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[101] Risi Kondor,et al. Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[102] William H. Press,et al. Numerical recipes in C , 2002 .

[103] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.

[104] Hisashi Kashima,et al. Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[105] Thomas Gärtner,et al. A survey of kernels for structured data , 2003, SKDD.

[106] Thomas Hofmann,et al. Hidden Markov Support Vector Machines , 2003, ICML.

[107] Alexander J. Smola,et al. Kernels and Regularization on Graphs , 2003, COLT.

[108] Fernando Pereira,et al. Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[109] Michael I. Jordan,et al. Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[110] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.

[111] Bernhard Schölkopf,et al. An Introduction to Support Vector Machines , 2003 .

[112] Bernhard Schölkopf,et al. Kernel Methods in Computational Biology , 2005 .

[113] Gunnar Rätsch,et al. Constructing Descriptive and Discriminative Nonlinear Features: Rayleigh Coefficients in Kernel Feature Spaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[114] R. Kondor,et al. Bhattacharyya and Expected Likelihood Kernels , 2003 .

[115] Shai Ben-David,et al. On the difficulty of approximately maximizing agreements , 2000, J. Comput. Syst. Sci..

[116] Yoram Singer,et al. Log-Linear Models for Label Ranking , 2003, NIPS.

[117] Matthias Hein,et al. Maximal Margin Classification for Metric Spaces , 2003, COLT.

[118] Xiaojin Zhu,et al. Kernel conditional random fields: representation and clique selection , 2004, ICML.

[119] Thomas Hofmann,et al. Unifying collaborative and content-based filtering , 2004, ICML.

[120] Ben Taskar,et al. Max-Margin Parsing , 2004, EMNLP.

[121] Zaïd Harchaoui,et al. A Machine Learning Approach to Conjoint Analysis , 2004, NIPS.