Dynamic feature selection method with minimum redundancy information for linear data

Feature selection plays a fundamental role in many data mining and machine learning tasks. In this paper, we proposed a novel feature selection method, namely, Dynamic Feature Selection Method with Minimum Redundancy Information (MRIDFS). In MRIDFS, the conditional mutual information is used to calculate the relevance and the redundancy among multiple features, and a new concept, the feature-dependent redundancy ratio, was introduced. Such ratio can represent redundancy more accurately. To evaluate our method, MRIDFS is tested and compared with seven popular methods on 16 benchmark data sets. Experimental results show that MRIDFS outperforms in terms of average classification accuracy.

[1]  Roberto Battiti,et al.  Feature Selection Based on the Neighborhood Entropy , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Colas Schretter,et al.  Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity , 2008, IEEE Journal of Selected Topics in Signal Processing.

[4]  Chenxia Jin,et al.  Feature selection with partition differentiation entropy for large-scale data sets , 2016, Inf. Sci..

[5]  F. Fleuret Binary Feature Selection with Conditional Mutual Information , 2003 .

[6]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[7]  J. Moody,et al.  Feature Selection Based on Joint Mutual Information , 1999 .

[8]  Yao Zhang,et al.  Feature selection based on conditional mutual information: minimum conditional relevance and minimum conditional redundancy , 2018, Applied Intelligence.

[9]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[10]  Yang Wang,et al.  Mutual information-based method for selecting informative feature sets , 2013, Pattern Recognit..

[11]  Li Wang,et al.  Hybrid huberized support vector machines for microarray classification and gene selection , 2008, Bioinform..

[12]  Ping Zhang,et al.  Feature selection considering two types of feature relevancy and feature interdependency , 2018, Expert Syst. Appl..

[13]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[14]  Li Li,et al.  Maximum relevance minimum common redundancy feature selection for nonlinear data , 2017, Inf. Sci..

[15]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[16]  Rui Menezes,et al.  Mutual information: a measure of dependency for nonlinear time series , 2004 .

[17]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[18]  Alessandra Alaniz Macedo,et al.  A tree-based algorithm for attribute selection , 2017, Applied Intelligence.

[19]  James Bailey,et al.  Can high-order dependencies improve mutual information based feature selection? , 2016, Pattern Recognit..

[20]  Jiawei Han,et al.  Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis , 2013, J. Biomed. Informatics.

[21]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[22]  Luonan Chen,et al.  Part mutual information for quantifying direct associations in networks , 2016, Proceedings of the National Academy of Sciences.

[23]  Feiping Nie,et al.  Discriminative Least Squares Regression for Multiclass Classification and Feature Selection , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[25]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[26]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[27]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[28]  Gavin C. Cawley,et al.  Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation , 2006, NIPS.

[29]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[30]  Richard Weber,et al.  Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines , 2014, Inf. Sci..

[31]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[32]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[33]  Rossitza Setchi,et al.  Feature selection using Joint Mutual Information Maximisation , 2015, Expert Syst. Appl..

[34]  Rossitza Setchi,et al.  Feature Interaction Maximisation , 2013, Pattern Recognit. Lett..

[35]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[36]  Wei Liu,et al.  Conditional Mutual Information Based Feature Selection , 2008, 2008 International Symposium on Knowledge Acquisition and Modeling.

[37]  Ping Zhang,et al.  Class-specific mutual information variation for feature selection , 2018, Pattern Recognit..

[38]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[40]  Dahua Lin,et al.  Conditional Infomax Learning: An Integrated Framework for Feature Extraction and Fusion , 2006, ECCV.