Feature Selection Method Based on Differential Correlation Information Entropy

Feature selection is one of the major aspects of pattern classification systems. In previous studies, Ding and Peng recognized the importance of feature selection and proposed a minimum redundancy feature selection method to minimize redundant features for sequential selection in microarray gene expression data. However, since the minimum redundancy feature selection method is used mainly to measure the dependency between random variables of mutual information, the results cannot be optimal without consideration of global feature selection. Therefore, based on the framework of minimum redundancy-maximum correlation, this paper introduces entropy to measure global feature selection and proposes a new feature subset evaluation method, differential correlation information entropy. In our function, different bivariate correlation metrics are selected. Then, the feature selection is completed through sequence forward search. Two different classification models are used on eleven standard data sets of the UCI machine learning knowledge base to compare various comparison algorithms, such as mRMR, reliefF and feature selection method with joint maximal information entropy, with our method. The experimental results show that feature selection based on our proposed method is obviously superior to that of other models.

[1]  Isabelle Guyon,et al.  An Introduction to Feature Extraction , 2006, Feature Extraction.

[2]  Manuel Graña,et al.  Evolutionary ELM wrapper feature selection for Alzheimer's disease CAD on anatomical brain MRI , 2014, Neurocomputing.

[3]  Wenyong Wang,et al.  A new feature selection method based on a validity index of feature subset , 2017, Pattern Recognit. Lett..

[4]  Yi Liu,et al.  FS_SFS: A novel feature selection method for support vector machines , 2006, Pattern Recognit..

[5]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[6]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[7]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[8]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Kangfeng Zheng,et al.  Feature selection method with joint maximal information entropy between features and class , 2018, Pattern Recognit..

[10]  Turker Tekin Erguzel,et al.  A wrapper-based approach for feature selection and classification of major depressive disorder-bipolar disorders , 2015, Comput. Biol. Medicine.

[11]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[12]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[13]  Gianluca Bontempi,et al.  On the Use of Variable Complementarity for Feature Selection in Cancer Classification , 2006, EvoWorkshops.

[14]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[15]  Marco Cristani,et al.  Infinite Feature Selection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Youngjoong Ko,et al.  Hierarchical speech-act classification for discourse analysis , 2013, Pattern Recognit. Lett..

[17]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[18]  Umberto Castellani,et al.  Infinite Latent Feature Selection: A Probabilistic Latent Graph-Based Ranking Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  M. Carmen Garrido,et al.  Feature subset selection Filter-Wrapper based on low quality data , 2013, Expert Syst. Appl..

[20]  Richard Weber,et al.  A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..

[21]  Bin Wu,et al.  Feature subset selection combining maximal information entropy and maximal information coefficient , 2019, Applied Intelligence.

[22]  Simone Melzi,et al.  Ranking to Learn: - Feature Ranking and Selection via Eigenvector Centrality , 2016, NFMCP@PKDD/ECML.

[23]  Aleks Jakulin Machine Learning Based on Attribute Interactions , 2005 .

[24]  Hamid Sheikhzadeh,et al.  Combined mRMR filter and sparse Bayesian classifier for analysis of gene expression data , 2016, 2016 2nd International Conference of Signal Processing and Intelligent Systems (ICSPIS).

[25]  Stan Matwin,et al.  A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization , 2001 .

[26]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[27]  Chongyi Li,et al.  A Feature Selection Algorithm Based on Equal Interval Division and Minimal-Redundancy–Maximal-Relevance , 2019, Neural Processing Letters.

[28]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[29]  Gang Chen,et al.  A novel wrapper method for feature selection and its applications , 2015, Neurocomputing.

[30]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.