Feature selection considering two types of feature relevancy and feature interdependency

A novel feature selection method is proposed based on information theory.Our method divides feature relevancy into two categories.We performed experiments over 12 public data sets.Our method outperforms five competing methods in terms of accuracy.Our method selects few number of features when it achieves the highest accuracy. Feature selection based on information theory, which is used to select a group of the most informative features, has extensive application fields such as machine learning, data mining and natural language processing. However, numerous previous methods suffer from two common defects. (1) Feature relevancy is defined without distinguishing candidate feature relevancy and selected feature relevancy. (2) Some interdependent features may be misinterpreted as redundant features. In this study, we propose a feature selection method named Dynamic Relevance and Joint Mutual Information Maximization (DRJMIM) to address these two defects. DRJMIM includes four stages. First, the relevancy is divided into two categories: candidate feature relevancy and selected feature relevancy. Second, according to candidate feature relevancy that is joint mutual information, some redundant features are selected. Third, the redundant features are combined with a dynamic weight to reduce the selection possibility of true redundant features while increasing the false ones. Finally, the most informative and interdependent features are selected and true redundant features are eliminated simultaneously. Furthermore, our method is compared with five competitive feature selection methods on 12 publicly available data sets. The classification results show that DRJMIM performs better than other five methods. Its statistical significance is verified by a paired two-tailed t-test. Meanwhile, DRJMIM obtains few number of selected features when it achieves the highest classification accuracy.

[1]  Jian Yang,et al.  Sparse discriminative feature selection , 2015, Pattern Recognit..

[2]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[3]  Jugal K. Kalita,et al.  MIFS-ND: A mutual information-based feature selection method , 2014, Expert Syst. Appl..

[4]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[5]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[6]  David G. Stork,et al.  Pattern Classification , 1973 .

[7]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[8]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Rossitza Setchi,et al.  Feature selection using Joint Mutual Information Maximisation , 2015, Expert Syst. Appl..

[10]  Jiawei Han,et al.  Feature selection using dynamic weights for classification , 2013, Knowl. Based Syst..

[11]  Jiebo Luo,et al.  Deep sparse feature selection for computer aided endoscopy diagnosis , 2015, Pattern Recognit..

[12]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[13]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[14]  Rui Menezes,et al.  Mutual information: a measure of dependency for nonlinear time series , 2004 .

[15]  Haider Banka,et al.  A Hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation , 2015, Pattern Recognit. Lett..

[16]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[17]  Rui Zhang,et al.  A novel feature selection method considering feature interaction , 2015, Pattern Recognit..

[18]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[19]  Luonan Chen,et al.  Part mutual information for quantifying direct associations in networks , 2016, Proceedings of the National Academy of Sciences.

[20]  Mark S. Nixon,et al.  Gait Feature Subset Selection by Mutual Information , 2007, 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems.

[21]  John E. Moody,et al.  Data Visualization and Feature Selection: New Algorithms for Nongaussian Data , 1999, NIPS.

[22]  Ian T. Jolliffe,et al.  Simplified EOFs - three alternatives to rotation , 2002 .

[23]  Jing Liu,et al.  Feature selection based on FDA and F-score for multi-class classification , 2017, Expert Syst. Appl..

[24]  Colas Schretter,et al.  Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity , 2008, IEEE Journal of Selected Topics in Signal Processing.

[25]  Stephen A. Billings,et al.  A new maximum relevance-minimum multicollinearity (MRmMC) method for feature selection and ranking , 2017, Pattern Recognit..

[26]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[27]  Charles F. Hockett,et al.  A mathematical theory of communication , 1948, MOCO.

[28]  Driss Aboutajdine,et al.  A Powerful Feature Selection approach based on Mutual Information , 2008 .

[29]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[30]  Jiawei Han,et al.  Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis , 2013, J. Biomed. Informatics.

[31]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[32]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[33]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.