Kernel Methods for Pattern Analysis

Kernel methods provide a powerful and unified framework for pattern discovery, motivating algorithms that can act on general types of data (e.g. strings, vectors or text) and look for general types of relations (e.g. rankings, classifications, regressions, clusters). The application areas range from neural networks and pattern recognition to machine learning and data mining. This book, developed from lectures and tutorials, fulfils two major roles: firstly it provides practitioners with a large toolkit of algorithms, kernels and solutions ready to use for standard pattern discovery problems in fields such as bioinformatics, text analysis, image analysis. Secondly it provides an easy introduction for students and researchers to the growing field of kernel-based pattern analysis, demonstrating with examples how to handcraft an algorithm or a kernel for a new specific application, and covering all the necessary conceptual and mathematical tools to do so.

[1]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[2]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[3]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[4]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[5]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[6]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[7]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .

[8]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[9]  V. Vapnik,et al.  A note one class of perceptrons , 1964 .

[10]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[11]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[12]  O. Mangasarian Linear and Nonlinear Separation of Patterns by Linear Programming , 1965 .

[13]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[14]  FRED W. SMITH,et al.  Pattern Classifier Design by Linear Programming , 1968, IEEE Transactions on Computers.

[15]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[16]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[17]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[18]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[19]  H. Wold Soft Modelling by Latent Variables: The Non-Linear Iterative Partial Least Squares (NIPALS) Approach , 1975, Journal of Applied Probability.

[20]  H. Wold Path Models with Latent Variables: The NIPALS Approach , 1975 .

[21]  H. Vinod Canonical ridge and econometrics of joint production , 1976 .

[22]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[23]  Michael G. Thomason,et al.  Syntactic Methods in Pattern Recognition , 1982 .

[24]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[25]  M. Talagrand The Glivenko-Cantelli Problem , 1987 .

[26]  Saburou Saitoh,et al.  Theory of Reproducing Kernels and Its Applications , 1988 .

[27]  A. Höskuldsson PLS regression methods , 1988 .

[28]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[29]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[30]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[31]  G. Wahba Spline models for observational data , 1990 .

[32]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[33]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[34]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[35]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[36]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[37]  John Shawe-Taylor,et al.  A Result of Vapnik with Applications , 1993, Discret. Appl. Math..

[38]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[39]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[40]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[41]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[42]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[43]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[44]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[45]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[46]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[47]  M. Talagrand New concentration inequalities in product spaces , 1996 .

[48]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[49]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[50]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[51]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[52]  H. Knutsson,et al.  A Unified Approach to PCA, PLS, MLR and CCA , 1997 .

[53]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[54]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[55]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[56]  Susan T. Dumais,et al.  Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing , 1998 .

[57]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[58]  Christopher K. I. Williams Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[59]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[60]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[61]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[62]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[63]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[64]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[65]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[66]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[67]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[68]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[69]  Manfred K. Warmuth,et al.  Predicting nearly as well as the best pruning of a planar decision graph , 2002, Theor. Comput. Sci..

[70]  Thorsten Joachims,et al.  Text categorization with support vector machines , 1999 .

[71]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[72]  Thomas Hofmann,et al.  Learning the Similarity of Documents: An Information-Geometric Approach to Document Retrieval and Categorization , 1999, NIPS.

[73]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[74]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[75]  C. Watkins Dynamic Alignment Kernels , 1999 .

[76]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[77]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[78]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[79]  John Shawe-Taylor,et al.  Characterizing Graph Drawing with Eigenvectors , 2000, J. Chem. Inf. Comput. Sci..

[80]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[81]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[82]  V. Koltchinskii,et al.  Rademacher Processes and Bounding the Risk of Function Learning , 2004, math/0405338.

[83]  Fan Jiang,et al.  Approximate Dimension Equalization in Vector-based Information Retrieval , 2000, ICML.

[84]  Florence d'Alché-Buc,et al.  Support Vector Machines based on a semantic kernel for text categorization , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[85]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[86]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[87]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[88]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[89]  Colin Fyfe,et al.  Kernel and Nonlinear Canonical Correlation Analysis , 2000, IJCNN.

[90]  André Elisseeff,et al.  Algorithmic Stability and Generalization Performance , 2000, NIPS.

[91]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[92]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[93]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[94]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[95]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[96]  Nello Cristianini,et al.  Composite Kernels for Hypertext Categorisation , 2001, ICML.

[97]  Nello Cristianini,et al.  On the Concentration of Spectral Properties , 2001, NIPS.

[98]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[99]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[100]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[101]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[102]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[103]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[104]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[105]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[106]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[107]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[108]  Nello Cristianini,et al.  Spectral Kernel Methods for Clustering , 2001, NIPS.

[109]  Mehryar Mohri,et al.  Rational Kernels , 2002, NIPS.

[110]  Nello Cristianini,et al.  Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis , 2002, NIPS.

[111]  John Shawe-Taylor,et al.  String Kernels, Fisher Kernels and Finite State Automata , 2002, NIPS.

[112]  Tong Zhang,et al.  Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..

[113]  Ralf Herbrich,et al.  Learning Kernel Classifiers: Theory and Algorithms , 2001 .

[114]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[115]  Jason Weston,et al.  Mismatch String Kernels for SVM Protein Classification , 2002, NIPS.

[116]  Nello Cristianini,et al.  Learning Semantic Similarity , 2002, NIPS.

[117]  Kiyoshi Asai,et al.  Marginalized kernels for biological sequences , 2002, ISMB.

[118]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[119]  Nello Cristianini,et al.  On the Extensions of Kernel Alignment , 2002 .

[120]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[121]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[122]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[123]  Jean-Philippe Vert A tree kernel to analyze phylog enetic profi les , 2002 .

[124]  Jean-Philippe Vert,et al.  Support Vector Machine Prediction of Signal Peptide Cleavage Site Using a New Class of Kernels for Strings , 2001, Pacific Symposium on Biocomputing.

[125]  N. Cristianini,et al.  Optimizing Kernel Alignment over Combinations of Kernel , 2002 .

[126]  Jean-Philippe Vert,et al.  Graph-Driven Feature Extraction From Microarray Data Using Diffusion Kernels and Kernel CCA , 2002, NIPS.

[127]  Kiyoshi Asai,et al.  Marginalized kernels for RNA sequence data analysis. , 2002, Genome informatics. International Conference on Genome Informatics.

[128]  Nello Cristianini,et al.  On the generalization of soft margin algorithms , 2002, IEEE Trans. Inf. Theory.

[129]  Nello Cristianini,et al.  On the Eigenspectrum of the Gram Matrix and Its Relationship to the Operator Eigenspectrum , 2002, ALT.

[130]  Manfred K. Warmuth,et al.  Path Kernels and Multiplicative Updates , 2002, J. Mach. Learn. Res..

[131]  Mehryar Mohri,et al.  Positive Definite Rational Kernels , 2003, COLT.

[132]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[133]  Yoshihiro Yamanishi,et al.  Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis , 2003, ISMB.

[134]  Christina S. Leslie,et al.  Fast Kernels for Inexact String Matching , 2003, COLT.

[135]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[136]  David R. Hardoon,et al.  LEARNING THE SEMANTICS OF MULTIMEDIA CONTENT WITH APPLICATION TO WEB IMAGE RETRIEVAL AND CLASSIFICATION , 2003 .

[137]  Marco Cuturi,et al.  A covariance kernel for proteins , 2003, q-bio/0310022.

[138]  David R. Hardoon,et al.  KCCA for different level precision in content-based image retrieval , 2003 .

[139]  John Shawe-Taylor,et al.  PAC Bayes and Margins , 2003 .

[140]  R. Kondor,et al.  Bhattacharyya and Expected Likelihood Kernels , 2003 .

[141]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[142]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[143]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[144]  T. Poggio,et al.  On optimal nonlinear associative recall , 1975, Biological Cybernetics.

[145]  Amnon Shashua,et al.  On the Relationship Between the Support Vector Machine for Classification and Sparsified Fisher's Linear Discriminant , 1999, Neural Processing Letters.

[146]  Nello Cristianini,et al.  Latent Semantic Kernels , 2001, Journal of Intelligent Information Systems.

[147]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[148]  James E. Breneman Kernel Methods for Pattern Analysis , 2005, Technometrics.

[149]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[150]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[151]  R. Shah,et al.  Least Squares Support Vector Machines , 2022 .

[152]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .