Fast data sampling for large scale support vector machines

Traditional algorithms for training the Support Vector Machines (SVMs) have a worst case time complexity of O(n3) and a space complexity of O(n2). This makes it difficult to scale the training algorithm for large scale datasets. In this paper, three algorithms have been proposed for reducing the training dataset. The algorithms mine the potential support vectors based on closeness to decision boundary information and use only them for learning the hyper-plane. The algorithms use spatial distribution descriptors such as median and quartiles to realize the closeness of data points to boundary. Initially a distance based algorithm is proposed for linear SVM, and later the same is extended for kernel SVM using projection vectors. The proposed data sampling algorithms have a time complexity of O(n). On experimentation, the algorithms are found to drastically reduce the number of training samples and accordingly reduce the training time of SVM, and in general, much compromise is not seen in classification accuracy.

[1]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[2]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[3]  Nishchal K. Verma,et al.  Support Vector Machine for Large Databases as Classifier , 2012, SEMCCO.

[4]  Jian-xiong Dong,et al.  Fast SVM training algorithm with decomposition on very large data sets , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[7]  S.V.M. Vishwanathan,et al.  SSVM: a simple SVM algorithm , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[8]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[9]  Kristin P. Bennett,et al.  Multicategory Classification by Support Vector Machines , 1999, Comput. Optim. Appl..

[10]  Irwin King,et al.  Locating support vectors via /spl beta/-skeleton technique , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[11]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[12]  S. Halgamuge,et al.  Reducing the Number of Training Samples for Fast Support Vector Machine Classification , 2004 .

[13]  Samy Bengio,et al.  A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[14]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[15]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[16]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..