Geometric approach to support vector machines learning for large datasets

The dissertation introduces Sphere Support Vector Machines (SphereSVM) and Minimal Norm Support Vector Machines (MNSVM) as the new fast classification algorithms that use geometrical properties of the underlying classification problems to efficiently obtain models describing training data. SphereSVM is based on combining minimal enclosing ball approach, state of the art nearest point problem solvers and probabilistic techniques. The blending of the three speeds up the training phase of SVMs significantly and reaches similar (i.e., practically the same) accuracy as the other classification models over several big and large real data sets within the strict validation frame of a double (nested) cross-validation (CV). MNSVM is further simplification of SphereSVM algorithm. Here, relatively complex classification task was converted into one of the simplest geometrical problems – minimal norm problem. This resulted in additional speedup compared to SphereSVM. The results shown are promoting both SphereSVM and MNSVM as outstanding alternatives for handling large and ultra-large datasets in a reasonable time without switching to various parallelization schemes for SVMs algorithms proposed recently. The variants of both algorithms, which work without explicit bias term, are also presented. In addition, other techniques aiming to improve the time efficiency are discussed (such as over-relaxation and improved support vector selection scheme). Finally, the accuracy and performance of all these modifications are carefully analyzed and results based on nested cross-validation procedure are shown.

[1]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[2]  Sergios Theodoridis,et al.  A geometric approach to Support Vector Machine (SVM) classification , 2006, IEEE Transactions on Neural Networks.

[3]  José R. Dorronsoro,et al.  An accelerated MDM algorithm for SVM training , 2008, ESANN.

[4]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[5]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[6]  Vojislav Kecman,et al.  Kernel Based Algorithms for Mining Huge Data Sets: Supervised, Semi-supervised, and Unsupervised Learning , 2006, Studies in Computational Intelligence.

[7]  I. Tsang,et al.  Authors' Reply to the "Comments on the Core Vector Machines: Fast SVM Training on Very Large Data Sets" , 2007 .

[8]  Ivor W. Tsang,et al.  Very Large SVM Training using Core Vector Machines , 2005, AISTATS.

[9]  Qi Li,et al.  Two-Stage Clustering with k-Means Algorithm , 2011 .

[10]  Kenneth L. Clarkson,et al.  Smaller core-sets for balls , 2003, SODA '03.

[11]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[12]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[13]  Kristin P. Bennett,et al.  A Pattern Search Method for Model Selection of Support Vector Regression , 2002, SDM.

[14]  Kristin P. Bennett,et al.  Duality and Geometry in SVM Classifiers , 2000, ICML.

[15]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .

[16]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[17]  S. Sathiya Keerthi,et al.  A fast iterative nearest point algorithm for support vector machine classifier design , 2000, IEEE Trans. Neural Networks Learn. Syst..

[18]  Shigeo Abe Support Vector Machines for Pattern Classification , 2010, Advances in Pattern Recognition.

[19]  Tobias Scheffer,et al.  Error Estimation and Model Selection , 1999, Künstliche Intell..

[20]  Qi Li,et al.  Fast parallel machine learning algorithms for large datasets using graphic processing unit , 2011 .

[21]  Qi Li,et al.  Fast k-means algorithm clustering , 2011, ArXiv.

[22]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[23]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[24]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[25]  V. Kecman,et al.  Feature ranking for pattern recognition: A comparison of filter methods , 2012, 2012 Proceedings of IEEE Southeastcon.

[26]  José R. Dorronsoro,et al.  An MDM solver for the nearest point problem in Scaled Convex Hulls , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[27]  Vojislav Kecman,et al.  Bias Term b in SVMs Again , 2004, ESANN.

[28]  Piyush Kumar,et al.  Minimum-Volume Enclosing Ellipsoids and Core Sets , 2005 .

[29]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[30]  Michael Vogt,et al.  SMO Algorithms for Support Vector Machines without Bias Term , 2002 .

[31]  Joseph S. B. Mitchell,et al.  Approximate minimum enclosing balls in high dimensions using core-sets , 2003, ACM J. Exp. Algorithmics.

[32]  Qi Li,et al.  GPUSVM: a comprehensive CUDA based support vector machine package , 2011, Central European Journal of Computer Science.

[33]  Vojislav Kecman,et al.  Minimal Norm Support Vector Machines for Large Classification Tasks , 2012, 2012 11th International Conference on Machine Learning and Applications.

[34]  Jacek M. Zurada,et al.  Generalized Core Vector Machines , 2006, IEEE Transactions on Neural Networks.

[35]  D. Young Iterative methods for solving partial difference equations of elliptic type , 1954 .

[36]  A. Agresti,et al.  Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions , 1998 .

[37]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[38]  Czech Technical,et al.  Optimization Algorithms for Kernel Methods , 2005 .

[39]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[40]  Stéphane Canu,et al.  Comments on the "Core Vector Machines: Fast SVM Training on Very Large Data Sets" , 2007, J. Mach. Learn. Res..

[41]  V. N. Malozemov,et al.  Finding the Point of a Polyhedron Closest to the Origin , 1974 .

[42]  Qi Li,et al.  Sphere Support Vector Machines for large classification tasks , 2013, Neurocomputing.

[43]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[44]  José R. Dorronsoro,et al.  Cycle-breaking acceleration of SVM training , 2009, Neurocomputing.

[45]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[46]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[47]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[48]  Qi Li,et al.  Parallel multitask cross validation for Support Vector Machine using GPU , 2013, J. Parallel Distributed Comput..

[49]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[50]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[51]  M. Narasimha Murty,et al.  Multiclass core vector machine , 2007, ICML '07.

[52]  Vojislav Kecman,et al.  On the equality of kernel AdaTron and sequential minimal optimization in classification and regression tasks and alike algorithms for kernel machines , 2003, ESANN.

[53]  Václav Hlavác,et al.  An iterative algorithm learning the maximal margin classifier , 2003, Pattern Recognit..