Support Vector Machines with Applications

Support vector machines (SVMs) appeared in the early nineties as optimal margin classiers in the context of Vapnikis statistical learning theory. Since then SVMs have been successfully applied to real-world data analysis problems, often providing improved results compared with other techniques. The SVMs operate within the framework of regularization theory by minimizing an empirical risk in a well-posed and consistent way. A clear advantage of the support vector approach is that sparse solutions to classi- cation and regression problems are usually obtained: only a few samples are involved in the determination of the classication or regression functions. This fact facilitates the application of SVMs to problems that involve a large amount of data, such as text processing and bioinformatics tasks. This paper is intended as an introduction to SVMs and their applications, emphasizing their key features. In addition, some algorithmic extensions and illustrative real-world applications of SVMs are shown.

[1]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[2]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[3]  David L. Phillips,et al.  A Technique for the Numerical Solution of Certain Integral Equations of the First Kind , 1962, JACM.

[4]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[5]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[6]  Grace Wahba ESTIMATING DERIVATIVES FROM OUTER SPACE. , 1969 .

[7]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[8]  V. Ivanov,et al.  The Theory of Approximate Methods and Their Application to the Numerical Solution of Singular Integr , 1978 .

[9]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[10]  Hanif D. Sherali,et al.  Linear Programming and Network Flows , 1977 .

[11]  N. Kampen,et al.  Stochastic processes in physics and chemistry , 1981 .

[12]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[13]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[14]  G. Wahba A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[15]  B. Yandell,et al.  Automatic Smoothing of Regression Functions in Generalized Linear Models , 1986 .

[16]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[17]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[18]  G. Wahba Spline models for observational data , 1990 .

[19]  D. Cox,et al.  Asymptotic Analysis of Penalized Likelihood and Related Estimators , 1990 .

[20]  Hanif D. Sherali,et al.  Linear programming and network flows (2nd ed.) , 1990 .

[21]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[22]  Sanjeev Arora,et al.  The Hardness of Approximate Optimia in Lattices, Codes, and Systems of Linear Equations , 1993, IEEE Annual Symposium on Foundations of Computer Science.

[23]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[24]  Mokhtar S. Bazaraa,et al.  Nonlinear Programming: Theory and Algorithms , 1993 .

[25]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[26]  B. Silverman,et al.  Canonical correlation analysis when the data are curves. , 1993 .

[27]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[28]  Federico Girosi,et al.  Support Vector Machines: Training and Applications , 1997 .

[29]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[30]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Gunnar Rätsch,et al.  Using support vector machines for time series prediction , 1999 .

[32]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[33]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[34]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[35]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[36]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[37]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[38]  Peter Müller,et al.  Issues in Bayesian Analysis of Neural Network Models , 1998, Neural Computation.

[39]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, International Conference on Artificial Neural Networks.

[40]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[41]  Christopher J. C. Burges,et al.  Geometry and invariance in kernel based methods , 1999 .

[42]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[43]  Bernhard Schölkopf,et al.  Entropy Numbers, Operators and Support Vector Kernels , 1999, EuroCOLT.

[44]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[45]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[46]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[47]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[48]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[49]  Adrian S. Lewis,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[50]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[51]  Kristin P. Bennett,et al.  Support vector machines: hype or hallelujah? , 2000, SKDD.

[52]  Christopher K. I. Williams,et al.  The Effect of the Input Density Distribution on Kernel-based Classifiers , 2000, ICML.

[53]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[54]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[55]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[56]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[57]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[58]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[59]  Jason Weston,et al.  Gene functional classification from heterogeneous data , 2001, RECOMB.

[60]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[61]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[62]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[63]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[64]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[65]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[66]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[67]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[68]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[69]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[70]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[71]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[72]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[73]  Ralf Herbrich,et al.  Learning Kernel Classifiers: Theory and Algorithms , 2001 .

[74]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[75]  Shahar Mendelson,et al.  Geometric Parameters of Kernel Machines , 2002, COLT.

[76]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[77]  H. Joseph Newton,et al.  A Conversation with Emanuel Parzen , 2002 .

[78]  Grace Wahba,et al.  Soft and hard classification by reproducing kernel Hilbert space methods , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[79]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[80]  Ingo Steinwart,et al.  Support Vector Machines are Universally Consistent , 2002, J. Complex..

[81]  Javier M. Moguerza,et al.  Detecting the Number of Clusters Using a Support Vector Machine Approach , 2002, ICANN.

[82]  Ming-Hsuan Yang,et al.  Learning Gender with Support Faces , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[83]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines , 2003, J. Mach. Learn. Res..

[84]  Javier M. Moguerza,et al.  An augmented Lagrangian interior-point method using directions of negative curvature , 2003, Math. Program..

[85]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[86]  T. Poggio,et al.  Regression and Classification with Regularization , 2003 .

[87]  Javier M. Moguerza,et al.  Combining Support Vector Machines and ARTMAP Architectures for Natural Classification , 2003, KES.

[88]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds , 2003, NIPS.

[89]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[90]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[91]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[92]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[93]  Javier M. Moguerza,et al.  Support Vector Machine Classifiers for Asymmetric Proximities , 2003, ICANN.

[94]  Lakhmi C. Jain,et al.  Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.

[95]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[96]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[97]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[98]  Yi Lin,et al.  Statistical Properties and Adaptive Tuning of Support Vector Machines , 2002, Machine Learning.

[99]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[100]  Ming-Wei Chang,et al.  Load Forecasting Using Support Vector Machines: A Study on EUNITE Competition 2001 , 2004, IEEE Transactions on Power Systems.

[101]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[102]  Si Wu,et al.  Conformal Transformation of Kernel Functions: A Data-Dependent Way to Improve Support Vector Machine Classifiers , 2002, Neural Processing Letters.

[103]  Javier M. Moguerza,et al.  Combining Kernel Information for Support Vector Classification , 2004, Multiple Classifier Systems.

[104]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[105]  Yi Lin,et al.  Support Vector Machines for Classification in Nonstandard Situations , 2002, Machine Learning.

[106]  Peter Sollich,et al.  Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities , 2002, Machine Learning.

[107]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[108]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[109]  Yi Lin,et al.  Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.

[110]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[111]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[112]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[113]  LinChih-Jen,et al.  A tutorial on -support vector machines , 2005 .

[114]  Stephen J. Wright,et al.  Framework for kernel regularization with application to protein clustering. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[115]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[116]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[117]  Javier M. Moguerza,et al.  Estimation of high-density regions using one-class neighbor machines , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[118]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[119]  Hao Helen Zhang Variable selection for support vector machines via smoothing spline anova , 2006 .

[120]  J. S. Marron,et al.  Distance-Weighted Discrimination , 2007 .

[121]  Ambuj Tewari,et al.  Sparseness vs Estimating Conditional Probabilities: Some Asymptotic Results , 2007, J. Mach. Learn. Res..

[122]  P. Massart,et al.  Statistical performance of support vector machines , 2008, 0804.0551.