A novel information theoretic-interact algorithm (IT-IN) for feature selection using three machine learning algorithms

The inclusion of irrelevant, redundant, and inconsistent features in the data-mining model results in poor predictions and high computational overhead. This paper proposes a novel information theoretic-based interact (IT-IN) algorithm, which concerns the relevance, redundancy, and consistency of the features. The proposed IT-IN algorithm is compared with existing Interact, FCBF, Relief and CFS feature selection algorithms. To evaluate the classification accuracy of IT-IN and remaining four feature selection algorithms, Naive Bayes, SVM, and ELM classifier are used for ten UCI repository datasets. The proposed IT-IN performs better than existing above algorithms in terms of number of features. The specially designed hash function is used to speed up the IT-IN algorithms and provides minimum computation time than the Interact algorithms. The result clearly reveals that the proposed feature selection algorithm improves the classification accuracy for ELM, Naive Bayes, and SVM classifiers. The performance of proposed IT-IN with ELM classifier is superior to other classifiers.

[1]  Huan Liu,et al.  Feature subset selection bias for classification learning , 2006, ICML.

[2]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[3]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[6]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[7]  Carla E. Brodley,et al.  Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[9]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[10]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[11]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[12]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[13]  Dianhui Wang ELM-based Multiple Classifier Systems , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.

[14]  Jesper Tegnér,et al.  Consistent Feature Selection for Pattern Recognition in Polynomial Time , 2007, J. Mach. Learn. Res..

[15]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[17]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[18]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[19]  Gang Wang,et al.  Feature selection with conditional mutual information maximin in text categorization , 2004, CIKM '04.

[20]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[21]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[22]  S. Gunn Support Vector Machines for Classification and Regression , 1998 .

[23]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[24]  Christos Faloutsos,et al.  Fast feature selection using fractal dimension , 2010, J. Inf. Data Manag..

[25]  Alex Alves Freitas,et al.  Data mining with an ant colony optimization algorithm , 2002, IEEE Trans. Evol. Comput..

[26]  Huan Liu,et al.  Searching for Interacting Features , 2007, IJCAI.

[27]  Salim Hariri,et al.  A new dependency and correlation analysis for features , 2005, IEEE Transactions on Knowledge and Data Engineering.

[28]  Tommy W. S. Chow,et al.  Effective feature selection scheme using mutual information , 2005, Neurocomputing.

[29]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Lei Liu,et al.  Feature selection with dynamic mutual information , 2009, Pattern Recognit..

[31]  Aleks Jakulin Machine Learning Based on Attribute Interactions , 2005 .

[32]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[33]  Mohamed A. Deriche,et al.  A new mutual information based measure for feature selection , 2003, Intell. Data Anal..

[34]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[35]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[36]  David A. Bell,et al.  A Formalism for Relevance and Its Application in Feature Subset Selection , 2000, Machine Learning.

[37]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .