Training linear SVMs in linear time

Linear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniques for high-dimensional sparse data commonly encountered in applications like text classification, word-sense disambiguation, and drug design. These applications involve a large number of examples n as well as a large number of features N, while each example has only s << N non-zero features. This paper presents a Cutting Plane Algorithm for training linear SVMs that provably has training time 0(s,n) for classification problems and o(sn log (n))for ordinal regression problems. The algorithm is based on an alternative, but equivalent formulation of the SVM optimization problem. Empirically, the Cutting-Plane Algorithm is several orders of magnitude faster than decomposition methods like svm light for large datasets.

[1]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[2]  R. L. Bradshaw,et al.  RESULTS AND ANALYSIS. , 1971 .

[3]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[4]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[7]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[8]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[9]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[10]  David R. Musicant,et al.  Lagrangian Support Vector Machines , 2001, J. Mach. Learn. Res..

[11]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[12]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[13]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[14]  Michael C. Ferris,et al.  Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..

[15]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[16]  Don R. Hush,et al.  Polynomial-Time Decomposition Algorithms for Support Vector Machines , 2003, Machine Learning.

[17]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[18]  Thorsten Joachims,et al.  KDD-Cup 2004: results and analysis , 2004, SKDD.

[19]  S. Sathiya Keerthi,et al.  A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[20]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[21]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[22]  Thorsten Joachims,et al.  Learning to Align Sequences: A Maximum-Margin Approach , 2006 .

[23]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.