Feature selection considering the composition of feature relevancy

Abstract Feature selection plays a critical role in classification problems. Feature selection methods intend to retain relevant features and eliminate redundant features. This work focuses on feature selection methods based on information theory. By analyzing the composition of feature relevancy, we believe that a good feature selection method should maximize new classification information while minimizing feature redundancy. Therefore, a novel feature selection method named Composition of Feature Relevancy (CFR) is proposed. To evaluate CFR, we conduct experiments on eight real-world data sets and two different classifiers (Naive-Bayes and Support Vector Machine). Our method outperforms five other competing methods in terms of average classification accuracy and highest classification accuracy.

[1]  Hao Liao,et al.  An efficient semi-supervised representatives feature selection algorithm based on information theory , 2017, Pattern Recognit..

[2]  Edwin R. Hancock,et al.  Joint hypergraph learning and sparse regression for feature selection , 2017, Pattern Recognit..

[3]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[4]  Ping Zhang,et al.  Feature selection considering two types of feature relevancy and feature interdependency , 2018, Expert Syst. Appl..

[5]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[6]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[7]  Jun Wang,et al.  Feature Selection by Maximizing Independent Classification Information , 2017, IEEE Transactions on Knowledge and Data Engineering.

[8]  Rui Zhang,et al.  A novel feature selection method considering feature interaction , 2015, Pattern Recognit..

[9]  Rossitza Setchi,et al.  Feature Interaction Maximisation , 2013, Pattern Recognit. Lett..

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Swagatam Das,et al.  Feature weighting and selection with a Pareto-optimal trade-off between relevancy and redundancy , 2017, Pattern Recognit. Lett..

[12]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[13]  Vural Aksakalli,et al.  Feature selection via binary simultaneous perturbation stochastic approximation , 2015, Pattern Recognit. Lett..

[14]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  John E. Moody,et al.  Data Visualization and Feature Selection: New Algorithms for Nongaussian Data , 1999, NIPS.

[16]  Stephen A. Billings,et al.  A new maximum relevance-minimum multicollinearity (MRmMC) method for feature selection and ranking , 2017, Pattern Recognit..