A feature subset selection method based on symmetric uncertainty and Ant Colony Optimization

Feature subset selection is one of the key problems in the area of pattern recognition and machine learning. Feature subset selection refers to the problem of selecting only those features that are useful in predicting a target concept i.e. class. Data acquired through different sources are not particularly screened for any specific task e.g. classification, clustering, anomaly detection, etc. When the data are fed to a learning algorithm, its results deteriorate. The proposed method is a pure filter based feature subset selection technique which incurs less computational cost and highly efficient in terms of classification accuracy. Moreover, along with high accuracy the proposed method requires less number of features in most of the cases. In the proposed method the issue of feature ranking and threshold value selection is addressed. The proposed method adaptively selects number of features as per the worth of an individual feature in the dataset. An extensive experimentation is performed, comprised of a number of benchmark datasets over three well known classification algorithms. Empirical results endorse efficiency and effectiveness of the proposed method.

[1]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[2]  Masao Fukushima,et al.  Tabu search for attribute reduction in rough set theory , 2008, Soft Comput..

[3]  Qiang Shen,et al.  Computational Intelligence and Feature Selection - Rough and Fuzzy Approaches , 2008, IEEE Press series on computational intelligence.

[4]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[5]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[6]  Tao Wang,et al.  A Hybrid Feature Selection Algorithm: Combination of Symmetrical Uncertainty and Genetic Algorithms , 2008 .

[7]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[8]  Marco Dorigo,et al.  Optimization, Learning and Natural Algorithms , 1992 .

[9]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[10]  Xiang Li,et al.  Ant colony optimization and mutual information hybrid algorithms for feature subset selection in equipment fault diagnosis , 2008, 2008 10th International Conference on Control, Automation, Robotics and Vision.

[11]  Hong Hu,et al.  Feature selection using the hybrid of ant colony optimization and mutual information for the forecaster , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[12]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[13]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[14]  R. Steele Optimization , 2005 .

[15]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[16]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[17]  Ron Kohavi,et al.  Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology , 1995, KDD.

[18]  Qiang Shen,et al.  Fuzzy-rough data reduction with ant colony optimization , 2005, Fuzzy Sets Syst..