A SIMPLE, FAST SUPPORT VECTOR MACHINE ALGORITHM FOR DATA MINING

Support Vector Machines (SVM) and kernel related methods have shown to build accurate models but the learning task usually needs a quadratic programming, so that the learning task for large datasets requires big memory capacity and a long time. A new incremental, parallel and distributed SVM algorithm using linear or non linear kernels proposed in this paper aims at classifying very large datasets on standard personal computers. We extend the recent finite Newton classifier for building an incremental, parallel and distributed SVM algorithm. The new algorithm is very fast and can handle very large datasets in linear or non linear classification tasks. An example of the effectiveness is given with the linear classification into two classes of two million datapoints in 20- dimensional input space in some seconds on ten personal computers (3 GHz Pentium IV, 512 MB RAM, Linux).

[1]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[3]  Stefan Rüping,et al.  Incremental Learning with Support Vector Machines , 2001, ICDM.

[4]  François Poulet,et al.  Mining Very Large Datasets with SVM and Visualization , 2005, ICEIS.

[5]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[7]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[8]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[9]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[10]  François Poulet,et al.  Towards High Dimensional Data Mining with Boosting of PSVM and Visualization Tools , 2004, ICEIS.

[11]  Olvi L. Mangasarian,et al.  A finite newton method for classification , 2002, Optim. Methods Softw..

[12]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  Gregory Piatetsky-Shapiro,et al.  Summary from the KDD-03 panel: data mining: the next 10 years , 2003, SKDD.

[15]  Glenn Fung,et al.  Incremental Support Vector Machine Classification , 2002, SDM.

[16]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[17]  François Poulet,et al.  Mining Very Large Datasets with Support Vector Machine Algorithms , 2003, ICEIS.

[18]  Kristin P. Bennett,et al.  Support vector machines: hype or hallelujah? , 2000, SKDD.

[19]  David R. Musicant,et al.  Lagrangian Support Vector Machines , 2001, J. Mach. Learn. Res..

[20]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[21]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[22]  D. W. Walker,et al.  LAPACK++: a design overview of object-oriented extensions for high performance linear algebra , 1993, Supercomputing '93.

[23]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[24]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .