Trends & Controversies: Support Vector Machines

My first exposure to Support Vector Machines came this spring when heard Sue Dumais present impressive results on text categorization using this analysis technique. This issue's collection of essays should help familiarize our readers with this interesting new racehorse in the Machine Learning stable. Bernhard Scholkopf, in an introductory overview, points out that a particular advantage of SVMs over other learning algorithms is that it can be analyzed theoretically using concepts from computational learning theory, and at the same time can achieve good performance when applied to real problems. Examples of these real-world applications are provided by Sue Dumais, who describes the aforementioned text-categorization problem, yielding the best results to date on the Reuters collection, and Edgar Osuna, who presents strong results on application to face detection. Our fourth author, John Platt, gives us a practical guide and a new technique for implementing the algorithm efficiently.

[1]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[2]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[3]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[4]  G. Zoutendijk,et al.  Methods of Feasible Directions , 1962, The Mathematical Gazette.

[5]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[6]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[7]  E. Polak Introduction to linear and nonlinear programming , 1973 .

[8]  H. Akaike A new look at the statistical model identification , 1974 .

[9]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[10]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[11]  Curran Pj Estimating green LAI from multispectral aerial photography , 1983 .

[12]  V. A. Morozov,et al.  Methods for Solving Incorrectly Posed Problems , 1984 .

[13]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[14]  B. Rock,et al.  Comparison of in situ and airborne spectral measurements of the blue shift associated with forest decline , 1988 .

[15]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[16]  D. Cox,et al.  Asymptotic Analysis of Penalized Likelihood and Related Estimators , 1990 .

[17]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[18]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[19]  F. Girosi,et al.  From regularization to radial, tensor and additive splines , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[20]  Harris Drucker,et al.  Boosting Performance in Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[21]  D. M. Moss,et al.  Red edge spectral measurements from sugar maple leaves , 1993 .

[22]  G. Carter Ratios of leaf reflectances in narrow wavebands as indicators of plant stress , 1994 .

[23]  Gilles Burel,et al.  Detection and localization of faces on digital images , 1994, Pattern Recognit. Lett..

[24]  Thomas S. Huang,et al.  Human face detection in a complex background , 1994, Pattern Recognit..

[25]  Isabelle Guyon,et al.  Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[26]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[27]  J. H. Selgeby Introduction to the Proceedings of the 1994 International Conference on Restoration of Lake Trout in the Laurentian Great Lakes , 1995 .

[28]  M Damashek,et al.  Gauging Similarity with n-Grams: Language-Independent Categorization of Text , 1995, Science.

[29]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[30]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[31]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[32]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[33]  Alex Pentland,et al.  Probabilistic visual learning for object detection , 1995, Proceedings of IEEE International Conference on Computer Vision.

[34]  Kah Kay Sung,et al.  Learning and example selection for object and pattern detection , 1995 .

[35]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[36]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[37]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[38]  I. Johnstone,et al.  Density estimation by wavelet thresholding , 1996 .

[39]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[40]  A. Gitelson,et al.  Signature Analysis of Leaf Reflectance Spectra: Algorithm Development for Remote Sensing of Chlorophyll , 1996 .

[41]  Bernhard Schölkopf,et al.  From Regularization Operators to Support Vector Kernels , 1997, NIPS.

[42]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[43]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[44]  Federico Girosi,et al.  Support Vector Machines: Training and Applications , 1997 .

[45]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[46]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .

[48]  Gunnar Rätsch,et al.  Predicting Time Series with Support Vector Machines , 1997, ICANN.

[49]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[50]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Massimiliano Pontil,et al.  Properties of Support Vector Machines , 1998, Neural Computation.

[52]  R. C. Williamson,et al.  Support vector regression with automatic accuracy control. , 1998 .

[53]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[54]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[55]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[56]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[57]  Nello Cristianini,et al.  The Kernel-Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines , 1998, ICML.

[58]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[59]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[60]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[61]  Olvi L. Mangasarian,et al.  Generalized Support Vector Machines , 1998 .

[62]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[63]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[64]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[65]  David R. Musicant,et al.  Successive overrelaxation for support vector machines , 1999, IEEE Trans. Neural Networks.

[66]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[67]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[68]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[69]  Si Wu,et al.  Improving support vector machine classifiers by modifying kernel functions , 1999, Neural Networks.

[70]  B. Schölkopf,et al.  Linear programs for automatic accuracy control in regression. , 1999 .

[71]  F. Pérez Cruz,et al.  A new training algorithm for support vectors machines , 1999 .

[72]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[73]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[74]  Gunnar Rätsch,et al.  Engineering Support Vector Machine Kerneis That Recognize Translation Initialion Sites , 2000, German Conference on Bioinformatics.

[75]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[76]  Martin Brown,et al.  Support vector machines for optimal classification and spectral unmixing , 1999 .

[77]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[78]  Johan A. K. Suykens,et al.  Least squares support vector machine classifiers: a large scale algorithm , 1999 .

[79]  Linda Kaufman,et al.  Solving the quadratic programming problem arising in support vector classification , 1999 .

[80]  J. Weston,et al.  Support vector density estimation , 1999 .

[81]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[82]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[83]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.