A Novel SVM Classification Method for Large Data Sets

Normal support vector machine (SVM) algorithms are not suitable for classification of large data sets because of high training complexity. This paper introduces a novel SVM classification approach for large data sets. It has two phases. In the first phase, an approximate classification is obtained by SVM using fast clustering techniques to select the training data from the original data set. In the second phase, the classification is refined by using only data near to the approximate hyper plane obtained in the first phase. Experimental results demonstrate that our approach has good classification accuracy while the training is significantly faster than other SVM classifiers. The proposed classifier has distinctive advantages on dealing with huge data sets.

[1]  M. Narasimha Murty,et al.  Clustering based large margin classification: a scalable approach using SOCP formulation , 2006, KDD '06.

[2]  Sergios Theodoridis,et al.  A geometric approach to Support Vector Machine (SVM) classification , 2006, IEEE Transactions on Neural Networks.

[3]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[4]  Chee Kheong Siew,et al.  Fast Modular network implementation for support vector machines , 2005, IEEE Transactions on Neural Networks.

[5]  David R. Karger,et al.  Text Bundling: Statistics Based Data-Reduction , 2003, ICML.

[6]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[7]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[9]  B. John Oommen,et al.  Enhancing prototype reduction schemes with recursion: a method applicable for "large" data sets , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[11]  M. Narasimha Murty,et al.  A near-optimal initial seed value selection in K-means means algorithm using a genetic algorithm , 1993, Pattern Recognit. Lett..

[12]  Jian-xiong Dong,et al.  Fast SVM training algorithm with decomposition on very large data sets , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[14]  Domenico Talia,et al.  P-AutoClass: Scalable Parallel Clustering for Mining Large Data Sets , 2003, IEEE Trans. Knowl. Data Eng..

[15]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[16]  Bhavani M. Thuraisingham,et al.  A new intrusion detection system using support vector machines and hierarchical clustering , 2007, The VLDB Journal.

[17]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[18]  Xiaoou Li,et al.  Support Vector Machine Classification Based on Fuzzy Clustering for Large Data Sets , 2006, MICAI.

[19]  Chin-Teng Lin,et al.  Support-vector-based fuzzy neural network for pattern classification , 2006, IEEE Transactions on Fuzzy Systems.

[20]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[21]  Giandomenico Spezzano,et al.  GP ensembles for large-scale data classification , 2006, IEEE Transactions on Evolutionary Computation.

[22]  Jiun-Hung Chen,et al.  Reducing SVM classification time using multiple mirror classifiers , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).