Learning with Support Vector Machines

Support Vectors Machines have become a well established tool within machine learning. They work well in practice and have now been used across a wide range of applications from recognizing hand-written digits, to face identification, text categorisation, bioinformatics, and database marketing. In this book we give an introductory overview of this subject. We start with a simple Support Vector Machine for performing binary classification before considering multi-class classification and learning in the presence of noise. We show that this framework can be extended to many other scenarios such as prediction with real-valued outputs, novelty detection and the handling of complex output structures such as parse trees. Finally, we give an overview of the main types of kernels which are used in practice and how to learn and make predictions from multiple types of input data. Table of Contents: Support Vector Machines for Classification / Kernel-based Models / Learning with Kernels

[1]  O. Mangasarian,et al.  Massive data discrimination via linear support vector machines , 2000 .

[2]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[3]  Andreas Maurer,et al.  Learning Similarity with Operator-valued Large-margin Classifiers , 2008, J. Mach. Learn. Res..

[4]  John C. Platt,et al.  Online Bayes Point Machines , 2003, PAKDD.

[5]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[6]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[7]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[8]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[9]  Yiming Ying,et al.  Learnability of Gaussians with Flexible Variances , 2007, J. Mach. Learn. Res..

[10]  Pieter Abbeel,et al.  Max-margin Classification of Data with Absent Features , 2008, J. Mach. Learn. Res..

[11]  Claude Lemaréchal,et al.  Practical Aspects of the Moreau-Yosida Regularization: Theoretical Preliminaries , 1997, SIAM J. Optim..

[12]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[13]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[14]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[15]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[16]  Kenneth O. Kortanek,et al.  Semi-Infinite Programming: Theory, Methods, and Applications , 1993, SIAM Rev..

[17]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[18]  Nello Cristianini,et al.  Query Learning with Large Margin Classi ersColin , 2000 .

[19]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[20]  C. Campbell,et al.  Generalization bounds for learning the kernel , 2009 .

[21]  Theodore B. Trafalis,et al.  An Analytic Center Machine , 2002, Machine Learning.

[22]  R. C. Williamson,et al.  Support vector regression with automatic accuracy control. , 1998 .

[23]  Alexander J. Smola,et al.  Fast Kernels for String and Tree Matching , 2002, NIPS.

[24]  O. Mangasarian Linear and Nonlinear Separation of Patterns by Linear Programming , 1965 .

[25]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[26]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[27]  S. Sathiya Keerthi,et al.  Improvements to the SMO algorithm for SVM regression , 2000, IEEE Trans. Neural Networks Learn. Syst..

[28]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[29]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[30]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[31]  Peter Sollich,et al.  Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities , 2002, Machine Learning.

[32]  Gunnar Rätsch,et al.  Active Learning with Support Vector Machines in the Drug Discovery Process , 2003, J. Chem. Inf. Comput. Sci..

[33]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[34]  Joachim M. Buhmann,et al.  Optimal Cluster Preserving Embedding of Nonmetric Proximity Data , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[36]  Simon Rogers,et al.  Prognostic classification of relapsing favorable histology Wilms tumor using cDNA microarray expression profiling and support vector machines , 2004, Genes, chromosomes & cancer.

[37]  Stephen P. Boyd,et al.  Semidefinite Programming , 1996, SIAM Rev..

[38]  Christina S. Leslie,et al.  Fast Kernels for Inexact String Matching , 2003, COLT.

[39]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[40]  J. Weston,et al.  Support vector density estimation , 1999 .

[41]  Colin Campbell,et al.  Bayes Point Machines , 2001, J. Mach. Learn. Res..

[42]  Kaizhu Huang,et al.  Enhanced protein fold recognition through a novel data integration approach , 2009, BMC Bioinformatics.

[43]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[44]  Knud D. Andersen,et al.  The Mosek Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm , 2000 .

[45]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[46]  Alexandre d'Aspremont,et al.  Support vector machine classification with indefinite kernels , 2007, Math. Program. Comput..

[47]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[48]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[49]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[50]  Volker Roth,et al.  Nonlinear Discriminant Analysis Using Kernel Functions , 1999, NIPS.

[51]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[52]  R. C. Williamson,et al.  Classification on proximity data with LP-machines , 1999 .

[53]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[54]  Colin Campbell,et al.  Analysis of SVM with Indefinite Kernels , 2009, NIPS.

[55]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[56]  Alexander Shapiro,et al.  Optimization Problems with Perturbations: A Guided Tour , 1998, SIAM Rev..

[57]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[58]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[59]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[60]  S. Sathiya Keerthi,et al.  Which Is the Best Multiclass SVM Method? An Empirical Study , 2005, Multiple Classifier Systems.

[61]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[62]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[63]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[64]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[65]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[66]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[67]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[68]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[69]  G. Wahba,et al.  Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2004 .

[70]  Thorsten Joachims,et al.  Estimating the Generalization Performance of an SVM Efficiently , 2000, ICML.

[71]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[72]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[73]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.