A geometric approach to train SVM on very large data sets

Reduced set method is an important approach to speed up support vector machine (SVM) training on large data sets. Existing works mainly focused on selecting patterns near the decision boundary for SVM training by applying clustering, nearest neighbor algorithm and so on. However, on very large data sets, these algorithms require huge computational overhead, and thus the total running time is still enormous. In this paper, an intuitive geometric method is developed to select convex hull samples in the feature space for SVM training, which has a time complexity that is linear with training set size n. Experiments on real data sets show that the proposed method not only preserves the generalization performance of the result SVM classifiers, but outperforms existing scale-up methods in terms of training time and number of support vectors.

[1]  Joseph S. B. Mitchell,et al.  Approximate minimum enclosing balls in high dimensions using core-sets , 2003, ACM J. Exp. Algorithmics.

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Timothy M. Chan Approximating the Diameter, Width, Smallest Enclosing Cylinder, and Minimum-Width Annulus , 2002, Int. J. Comput. Geom. Appl..

[5]  Piotr Indyk,et al.  Approximate clustering via core-sets , 2002, STOC '02.

[6]  Zheng Nanning,et al.  Unsupervised clustering based reduced support vector machines , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7]  Antônio de Pádua Braga,et al.  SVM-KM: speeding SVMs learning with a priori cluster selection and k-means , 2000, Proceedings. Vol.1. Sixth Brazilian Symposium on Neural Networks.

[8]  Wen Chang-ji Fast Pattern Selection for Support Vector Classifiers , 2007 .

[9]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[10]  Irwin King,et al.  Locating support vectors via /spl beta/-skeleton technique , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Shigeo Abe,et al.  Fast Training of Support Vector Machines by Extracting Boundary Data , 2001, ICANN.

[13]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.