Feature-Selected Tree-Based Classification

Feature selection can decrease classifier size and improve accuracy by removing noisy and/or redundant features. However, it is possible for feature selection to yield features that are only partially informative about the classes in the set. These features are beneficial for distinguishing between some classes but not others. In these cases, it is beneficial to divide the large classification problem into a set of smaller problems, where a more specific set of features can be used to classify different classes. Dividing a problem this way is also common when the base classifier is binary, and the problem needs to be reformulated as a set of two-class problems so it can be handled by the classifier. This paper presents a method for multiclass classification that simultaneously formulates a binary tree of simpler classification subproblems and performs feature selection for the individual classifiers. The feature selected hierarchical classifier (FSHC) is tested against several well-known techniques for multiclass division. Tests are run on nine different real data sets and one artificial data set using a support vector machine (SVM) classifier. The results show that the accuracy obtained by the FSHC is comparable with other common multiclass SVM methods. Furthermore, the results demonstrate that the algorithm creates solutions with fewer classifiers, fewer features, and a shorter testing time than the other SVM multiclass extensions.

[1]  Gavin Brown,et al.  A New Perspective for Information Theoretic Feature Selection , 2009, AISTATS.

[2]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[3]  Soo-Young Lee,et al.  Support Vector Machines with Binary Tree Architecture for Multi-Class Classification , 2004 .

[4]  Cheng Wang,et al.  Using Stacked Generalization to Combine SVMs in Magnitude and Shape Feature Spaces for Classification of Hyperspectral Data , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Erick Cantú-Paz,et al.  Feature Subset Selection, Class Separability, and Genetic Algorithms , 2004, GECCO.

[6]  Xizhao Wang,et al.  Fast Fuzzy Multicategory SVM Based on Support Vector Domain Description , 2008, Int. J. Pattern Recognit. Artif. Intell..

[7]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[8]  Cheng Wang,et al.  Combining Support Vector Machines With a Pairwise Decision Tree , 2008, IEEE Geoscience and Remote Sensing Letters.

[9]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Pavel Pudil,et al.  Fast dependency-aware feature selection in very-high-dimensional pattern recognition , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[11]  S. Abe,et al.  Fuzzy support vector machines for pattern classification , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[12]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[13]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[14]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[15]  Li-Yeh Chuang,et al.  Improved binary particle swarm optimization using catfish effect for feature selection , 2011, Expert Syst. Appl..

[16]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[17]  Xizhao Wang,et al.  OFFSS: optimal fuzzy-valued feature subset selection , 2003, IEEE Trans. Fuzzy Syst..

[18]  Nasser Ghasem-Aghaee,et al.  Text feature selection using ant colony optimization , 2009, Expert Syst. Appl..

[19]  Daniel S. Yeung,et al.  A genetic algorithm for solving the inverse problem of support vector machines , 2005, Neurocomputing.

[20]  Pavel Pudil,et al.  Dynamic Oscillating Search algorithm for feature selection , 2008, 2008 19th International Conference on Pattern Recognition.

[21]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[22]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[23]  Lei Liu,et al.  Boosting feature selection using information metric for classification , 2009, Neurocomputing.

[24]  Cheng Wang,et al.  Adaptive binary tree for fast SVM multiclass classification , 2009, Neurocomputing.

[25]  Dejan Gjorgjevikj,et al.  A Multi-class SVM Classifier Utilizing Binary Decision Tree , 2009, Informatica.

[26]  Dana Kulic,et al.  Joint feature selection and hierarchical classifier design , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[27]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[28]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[29]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[30]  Soo-Young Lee,et al.  Hybrid Feature Selection: Combining Fisher Criterion and Mutual Information for Efficient Feature Selection , 2008, ICONIP.

[31]  Anil K. Jain,et al.  Parsimonious network design and feature selection through node pruning , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[32]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[33]  Anil K. Jain,et al.  Algorithms for feature selection: An evaluation , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[34]  W. Gueaieb Soft computing and intelligent systems design - [Book review] , 2006, IEEE Computational Intelligence Magazine.

[35]  Ludmila I. Kuncheva,et al.  A stability index for feature selection , 2007, Artificial Intelligence and Applications.

[36]  Pavel Pudil,et al.  Oscillating search algorithms for feature selection , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[37]  John C. Platt Using Analytic QP and Sparseness to Speed Training of Support Vector Machines , 1998, NIPS.

[38]  Constantin Zopounidis,et al.  Feature selection algorithms in classification problems: an experimental evaluation , 2005, Optim. Methods Softw..

[39]  Kemal Kilic,et al.  A genetic algorithm based feature weighting methodology , 2010, The 40th International Conference on Computers & Indutrial Engineering.

[40]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[41]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[42]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[43]  Xizhao Wang,et al.  Attributes Reduction Using Fuzzy Rough Sets , 2008, IEEE Transactions on Fuzzy Systems.

[44]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[45]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[46]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[47]  Amparo Alonso-Betanzos,et al.  A Wrapper Method for Feature Selection in Multiple Classes Datasets , 2009, IWANN.