Data Selection Using Decision Tree for SVM Classification

Support Vector Machine (SVM) is an important classification method used in a many areas. The training of SVM is almost O(n^{2}) in time and space. Some methods to reduce the training complexity have been proposed in last years. Data selection methods for SVM select most important examples from training data sets to improve its training time. This paper introduces a novel data reduction method that works detecting clusters and then selects some examples from them. Different from other state of the art algorithms, the novel method uses a decision tree to form partitions that are treated as clusters, and then executes a guided random selection of examples. The clusters discovered by a decision tree can be linearly separable, taking advantage of the Eidelheit separation theorem, it is possible to reduce the size of training sets by carefully selecting examples from training sets. The novel method was compared with LibSVM using public available data sets, experiments demonstrate an important reduction of the size of training sets whereas showing only a slight decreasing in the accuracy of classifier.

[1]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[2]  Marimuthu Palaniswami,et al.  Incremental training of support vector machines , 2005, IEEE Transactions on Neural Networks.

[3]  B. Fei,et al.  Binary tree of SVM: a new fast multiclass training and classification algorithm , 2006, IEEE Transactions on Neural Networks.

[4]  Chi-Jen Lu,et al.  Tree Decomposition for Large-Scale SVM Problems , 2010, 2010 International Conference on Technologies and Applications of Artificial Intelligence.

[5]  Defeng Wang,et al.  Selecting valuable training samples for SVMs via data structure analysis , 2008, Neurocomputing.

[6]  Cheng Wang,et al.  Combining Support Vector Machines With a Pairwise Decision Tree , 2008, IEEE Geoscience and Remote Sensing Letters.

[7]  Hui Zhao,et al.  A Classification Method Based on Non-linear SVM Decision Tree , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[8]  Xizhao Wang,et al.  Multi-stage decision tree based on inter-class and inner-class margin of SVM , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[9]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[10]  Madan Gopal,et al.  A hybrid SVM based decision tree , 2010, Pattern Recognit..

[11]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[12]  Jean-Marc Ogier,et al.  Accented Handwritten Character Recognition Using SVM - Application to French , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[13]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[14]  Sungzoon Cho,et al.  Neighborhood Property-Based Pattern Selection for Support Vector Machines , 2007, Neural Comput..

[15]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[16]  Xiaoou Li,et al.  Support Vector Machine Classification Based on Fuzzy Clustering for Large Data Sets , 2006, MICAI.

[17]  Chengjun Liu,et al.  Iris recognition based on robust iris segmentation and image enhancement , 2012, Int. J. Biom..

[18]  Tong Zhang,et al.  An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[19]  Guosheng Wang A Survey on Training Algorithms for Support Vector Machine Classifiers , 2008, 2008 Fourth International Conference on Networked Computing and Advanced Information Management.

[20]  Boubakeur Boufama,et al.  A novel SVM+NDA model for classification with an application to face recognition , 2012, Pattern Recognit..

[21]  Feng Yan,et al.  A fast training algorithm for support vector machine via boundary sample selection , 2003, International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003.

[22]  Fernando Fernández,et al.  Local Feature Weighting in Nearest Prototype Classification , 2008, IEEE Transactions on Neural Networks.

[23]  Bo Liu,et al.  A SVC Iterative Learning Algorithm Based on Sample Selection for Large Samples , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[24]  Leon N. Cooper,et al.  Selecting Data for Fast Support Vector Machines Training , 2007, Trends in Neural Computation.