A tutorial on support vector regression

In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.

[1]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[2]  H. Nyquist,et al.  Certain Topics in Telegraph Transmission Theory , 1928, Transactions of the American Institute of Electrical Engineers.

[3]  I. J. Schoenberg Metric spaces and completely monotone functions , 1938 .

[4]  I. J. Schoenberg,et al.  Metric spaces and positive definite functions , 1938 .

[5]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[6]  R. V. Churchill,et al.  Lectures on Fourier Integrals , 1959 .

[7]  E. Parzen Regression Analysis of Continuous Parameter Time Series , 1961 .

[8]  P. Wolfe A duality theorem for non-linear programming , 1961 .

[9]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .

[10]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[11]  V. Vapnik,et al.  A note one class of perceptrons , 1964 .

[12]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[13]  O. Mangasarian Linear and Nonlinear Separation of Patterns by Linear Programming , 1965 .

[14]  Nils J. Nilsson,et al.  Learning Machines: Foundations of Trainable Pattern-Classifying Systems , 1965 .

[15]  Olvi L. Mangasarian,et al.  Multisurface method of pattern separation , 1968, IEEE Trans. Inf. Theory.

[16]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[17]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[18]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[19]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[20]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[21]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[22]  J. Bunch,et al.  Some stable methods for calculating inertia and solving symmetric linear systems , 1977 .

[23]  J. Bunch,et al.  Decomposition of a symmetric matrix , 1976 .

[24]  Mordecai Avriel,et al.  Nonlinear programming , 1976 .

[25]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[26]  Steven A. Orszag,et al.  CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS , 1978 .

[27]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[28]  J. Bunch,et al.  A computational method for the indefinite quadratic programming problem , 1980 .

[29]  Nesa L'abbe Wu,et al.  Linear programming and extensions , 1981 .

[30]  G. McCormick Nonlinear Programming: Theory, Algorithms and Applications , 1983 .

[31]  Karl Rohnke Silver Bullets , 1984 .

[32]  V. A. Morozov,et al.  Methods for Solving Incorrectly Posed Problems , 1984 .

[33]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[34]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[35]  Nira Dyn,et al.  Interpolation of scattered Data by radial Functions , 1987, Topics in Multivariate Approximation.

[36]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[37]  R. Fletcher Practical Methods of Optimization , 1988 .

[38]  Saburou Saitoh,et al.  Theory of Reproducing Kernels and Its Applications , 1988 .

[39]  Alan L. Yuille,et al.  The Motion Coherence Theory , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[40]  W. Madych,et al.  Multivariate interpolation and condi-tionally positive definite functions , 1988 .

[41]  N. Megiddo Pathways to the optimal set in linear programming , 1989 .

[42]  R. L. Fletcher A bioassay technique using the marine fouling green alga enteromorpha , 1989 .

[43]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[44]  B. Carl,et al.  Entropy, Compactness and the Approximation of Operators , 1990 .

[45]  G. Wahba Spline models for observational data , 1990 .

[46]  D. Cox,et al.  Asymptotic Analysis of Penalized Likelihood and Related Estimators , 1990 .

[47]  Toshio Odanaka,et al.  ADAPTIVE CONTROL PROCESSES , 1990 .

[48]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[49]  Y. Meyer,et al.  Wavelets and Filter Banks , 1991 .

[50]  Shlomo Geva,et al.  A one neuron truck backer-upper , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[51]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[52]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[53]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[54]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[55]  Bart Kosko,et al.  Neural networks for signal processing , 1992 .

[56]  D. George Wilson,et al.  Introduction to the IBM Optimization Subroutine Library , 1992, IBM Syst. J..

[57]  Isabelle Guyon,et al.  Automatic Capacity Tuning of Very Large VC-Dimension Classifiers , 1992, NIPS.

[58]  Sanjay Mehrotra,et al.  On the Implementation of a Primal-Dual Interior Point Method , 1992, SIAM J. Optim..

[59]  W. Härdle Applied Nonparametric Regression , 1992 .

[60]  Roy E. Marsten,et al.  On Implementing Mehrotra's Predictor-Corrector Interior-Point Method for Linear Programming , 1992, SIAM J. Optim..

[61]  Jancik,et al.  Multisurface Method of Pattern Separation , 1993 .

[62]  F. Girosi,et al.  From regularization to radial, tensor and additive splines , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[63]  P. Bickel Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[64]  Mokhtar S. Bazaraa,et al.  Nonlinear Programming: Theory and Algorithms , 1993 .

[65]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[66]  Y. Meyer Wavelets and Operators , 1993 .

[67]  Hector A. Rosales-Macedo Nonlinear Programming: Theory and Algorithms (2nd Edition) , 1993 .

[68]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[69]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[70]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[71]  C. Small,et al.  Hilbert Space Methods in Probability and Statistical Inference , 1994 .

[72]  A. Ron,et al.  Strictly positive definite functions on spheres in Euclidean spaces , 1994, Math. Comput..

[73]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[74]  Martin Anthony,et al.  Probabilistic Analysis of Learning in Artificial Neural Networks: The PAC Model and its Variants , 1994 .

[75]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[76]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[77]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[78]  Richard S. Sutton,et al.  The Truck Backer-Upper: An Example of Self-Learning in Neural Networks , 1995 .

[79]  Jelena Kovacevic,et al.  Wavelets and Subband Coding , 2013, Prentice Hall Signal Processing Series.

[80]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[81]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[82]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[83]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[84]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[85]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[86]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[87]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[88]  Bernhard Schölkopf,et al.  Incorporating Invariances in Support Vector Learning Machines , 1996, ICANN.

[89]  T. Tsuchiya,et al.  On the formulation and theory of the Newton interior-point method for nonlinear programming , 1996 .

[90]  Bernhard Schölkopf,et al.  Comparison of View-Based Object Recognition Algorithms Using Realistic 3D Models , 1996, ICANN.

[91]  John Shawe-Taylor,et al.  A framework for structural risk minimisation , 1996, COLT '96.

[92]  Alexander J. Smola,et al.  Regression estimation with support vector learning machines , 1996 .

[93]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[94]  U. M. Feyyad Data mining and knowledge discovery: making sense out of data , 1996 .

[95]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[96]  Bernhard Schölkopf,et al.  From Regularization Operators to Support Vector Kernels , 1997, NIPS.

[97]  Bernhard Schölkopf,et al.  Improving the accuracy and speed of support vector learning machines , 1997, NIPS 1997.

[98]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[99]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[100]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[101]  Federico Girosi,et al.  Support Vector Machines: Training and Applications , 1997 .

[102]  William I. Gasarch,et al.  Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.

[103]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[104]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization , 1997 .

[105]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1993, Graduate Texts in Computer Science.

[106]  Vladimir Vapnik,et al.  The Support Vector Method , 1997, ICANN.

[107]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[108]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[109]  Gunnar Rätsch,et al.  Predicting Time Series with Support Vector Machines , 1997, ICANN.

[110]  Bernhard Schölkopf,et al.  On a Kernel-Based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion , 1998, Algorithmica.

[111]  R. C. Williamson,et al.  Support vector regression with automatic accuracy control. , 1998 .

[112]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[113]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[114]  Christopher K. I. Williams Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[115]  Dafydd Gibbon,et al.  1 User’s guide , 1998 .

[116]  Federico Girosi,et al.  Reducing the run-time complexity of Support Vector Machines , 1999 .

[117]  Marti A. Hearst Intelligent Connections: Battling with GA-Joe. , 1998 .

[118]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[119]  Bernhard Schölkopf,et al.  Fast Approximation of Support Vector Kernel Expansions, and an Interpretation of Clustering as Approximation in Feature Spaces , 1998, DAGM-Symposium.

[120]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[121]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[122]  W. N. Street,et al.  Improved Generalization via Tolerant Training , 1998 .

[123]  Bernhard Schölkopf,et al.  Semiparametric Support Vector and Linear Programming Machines , 1998, NIPS.

[124]  B. Schölkopf,et al.  General cost functions for support vector regression. , 1998 .

[125]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[126]  Robert J. Vanderbei,et al.  Linear Programming: Foundations and Extensions , 1998, Kluwer international series in operations research and management service.

[127]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[128]  VectorRegressionAlex J. Smola A Tutorial on Support Vector Regression Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150 , 1998 .

[129]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[130]  Gunnar Rätsch,et al.  Kernel PCA pattern reconstruction via approximate pre-images. , 1998 .

[131]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[132]  Nello Cristianini,et al.  Dynamically Adapting Kernels in Support Vector Machines , 1998, NIPS.

[133]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[134]  B. Schölkopf,et al.  Asymptotically Optimal Choice of ε-Loss for Support Vector Machines , 1998 .

[135]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[136]  Alexander J. Smola,et al.  Support Vector Machine Reference Manual , 1998 .

[137]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[138]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[139]  J. Weston,et al.  Support vector regression with ANOVA decomposition kernels , 1999 .

[140]  Simon Haykin,et al.  Support vector machines for dynamic reconstruction of a chaotic system , 1999 .

[141]  Christopher J. C. Burges,et al.  Geometry and invariance in kernel based methods , 1999 .

[142]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[143]  Kristin P. Bennett,et al.  Combining support vector and mathematical programming methods for classification , 1999 .

[144]  Nello Cristianini,et al.  Multiplicative Updatings for Support Vector Learning , 1999 .

[145]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[146]  B. Schölkopf,et al.  Linear programs for automatic accuracy control in regression. , 1999 .

[147]  R. Vanderbei LOQO user's manual — version 3.10 , 1999 .

[148]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[149]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[150]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[151]  Vladimir Vapnik,et al.  Three remarks on the support vector method of function estimation , 1999 .

[152]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[153]  Nello Cristianini,et al.  Generalization Performance of Classifiers in Terms of Observed Covering Numbers , 1999, EuroCOLT.

[154]  David J. Crisp,et al.  A Geometric Interpretation of ?-SVM Classifiers , 1999, NIPS 2000.

[155]  Linda Kaufman,et al.  Solving the quadratic programming problem arising in support vector classification , 1999 .

[156]  J. Weston,et al.  Support vector density estimation , 1999 .

[157]  R. Vanderbei LOQO:an interior point code for quadratic programming , 1999 .

[158]  Alexander J. Smola,et al.  Regularization with Dot-Product Kernels , 2000, NIPS.

[159]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[160]  O. Mangasarian,et al.  Massive data discrimination via linear support vector machines , 2000 .

[161]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[162]  Kristin P. Bennett,et al.  Duality and Geometry in SVM Classifiers , 2000, ICML.

[163]  Bernhard Schölkopf,et al.  Choosing /spl nu/ in support vector regression with different noise models-theory and experiments , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[164]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[165]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[166]  B. Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, ICML.

[167]  Chih-Jen Lin,et al.  The analysis of decomposition methods for support vector machines , 2000, IEEE Trans. Neural Networks Learn. Syst..

[168]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[169]  P. Bartlett,et al.  Gaussian Processes and SVM: Mean Field and Leave-One-Out , 2000 .

[170]  Smola,et al.  Entropy Numbers for Convex Combinations and MLPs , 2000 .

[171]  Chih-Jen Lin,et al.  On the convergence of the decomposition method for support vector machines , 2001, IEEE Trans. Neural Networks.

[172]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[173]  Arthur Gretton,et al.  Estimating the Leave-One-Out Error for Classification Learning with SVMs , 2001 .

[174]  Cheng-Chew Lim,et al.  Dual /spl nu/-support vector machine with error rate and training size biasing , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[175]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[176]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[177]  Chih-Jen Lin,et al.  Training ν-Support Vector Classifiers: Theory and Algorithms , 2001 .

[178]  Yuh-Jye Lee,et al.  SSVM: A Smooth Support Vector Machine for Classification , 2001, Comput. Optim. Appl..

[179]  Chih-Jen Lin,et al.  Formulations of Support Vector Machines: A Note from an Optimization Point of View , 2001, Neural Computation.

[180]  Bernhard Schölkopf,et al.  Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators , 1998 .

[181]  Ralf Herbrich,et al.  Learning Kernel Classifiers: Theory and Algorithms , 2001 .

[182]  Ingo Steinwart,et al.  Support Vector Machines are Universally Consistent , 2002, J. Complex..

[183]  Jitendra Malik,et al.  Learning to Detect Natural Image Boundaries Using Brightness and Texture , 2002, NIPS.

[184]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[185]  Carl Staelin,et al.  A Personal Email Assistant , 2002 .

[186]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines , 2003, J. Mach. Learn. Res..

[187]  Chih-Jen Lin,et al.  Decomposition Methods for Linear Support Vector Machines , 2003, Neural Computation.

[188]  Ingo Steinwart,et al.  On the Optimal Parameter Choice for v-Support Vector Machines , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[189]  Bernhard Schölkopf,et al.  Experimentally optimal v in support vector regression for different noise models and parameter settings , 2004, Neural Networks.

[190]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[191]  T. Poggio,et al.  On optimal nonlinear associative recall , 1975, Biological Cybernetics.

[192]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[193]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[194]  Cheng-Chew Lim,et al.  An Implementation of Training Dual-nu Support Vector Machines , 2005 .

[195]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[196]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[197]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[198]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .