Supervised feature selection using integration of densest subgraph finding with floating forward-backward search

Abstract In this paper, a novel approach of supervised feature selection is proposed based on the principle of dense subgraph discovery. To exploit dense subgraph discovery for the purpose of feature selection, the dataset is initially mapped to an equivalent weighted graph notation by considering the set of all features as its vertex set and the mutual dependency between each pair of features as the weight of the corresponding edge. The proposed feature selection algorithm proceeds in a two-phase manner. In the first stage, a dense sub-graph is first discovered so that the features within it become maximally non-redundant among each other and the averaged class relevance as well as averaged standard deviation of all these features are obtained as maximal as possible. In this regard, a novel induced degree is also defined for each feature by incorporating the aforesaid three important objectives of feature selection. In this phase, a modified version of an existing approximation algorithm is also used to find dense subgraph module. Finally, in the second stage, a floating forward-backward search is performed on the dense subgraph so obtained to reveal a better feature subset. In both stages, an existing version of the normalized mutual information score is employed to compute both the class relevance and redundancy. The main contribution of this paper is proposing a feature selection strategy by which the reduced features have the characteristics like maximal average class relevance, minimal average pairwise redundancy, and good discriminating power. The experimental results demonstrate that the proposed approach is competent with several conventional as well as state-of-art algorithms of supervised feature selection.

[1]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Shutao Li,et al.  Gene Selection Using Wilcoxon Rank Sum Test and Support Vector Machine for Cancer Classification , 2007, CIS.

[3]  I. Jolliffe Principal Component Analysis , 2002 .

[4]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[5]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[6]  Ujjwal Maulik,et al.  Integration of dense subgraph finding with feature clustering for unsupervised feature selection , 2014, Pattern Recognit. Lett..

[7]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[8]  Feiping Nie,et al.  Discriminative Least Squares Regression for Multiclass Classification and Feature Selection , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[9]  E.R. Dougherty,et al.  Feature selection in the classification of high-dimension data , 2008, 2008 IEEE International Workshop on Genomic Signal Processing and Statistics.

[10]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[11]  Sergei Vassilvitskii,et al.  Densest Subgraph in Streaming and MapReduce , 2012, Proc. VLDB Endow..

[12]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Sanghamitra Bandyopadhyay,et al.  Identification of Multiview Gene Modules Using Mutual Information-Based Hypograph Mining , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[14]  Young-Koo Lee,et al.  An Improved Maximum Relevance and Minimum Redundancy Feature Selection Algorithm Based on Normalized Mutual Information , 2010, 2010 10th IEEE/IPSJ International Symposium on Applications and the Internet.

[15]  Belén Melián-Batista,et al.  High-dimensional feature selection via feature grouping: A Variable Neighborhood Search approach , 2016, Inf. Sci..

[16]  Richard Weber,et al.  Feature selection for Support Vector Machines via Mixed Integer Linear Programming , 2014, Inf. Sci..

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[19]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Wlodzislaw Duch,et al.  Feature Selection for High-Dimensional Data - A Pearson Redundancy Based Filter , 2008, Computer Recognition Systems 2.

[21]  Feiping Nie,et al.  Effective Discriminative Feature Selection With Nontrivial Solution , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Xin Jin,et al.  Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles , 2006, BioDM.

[23]  Feiping Nie,et al.  Feature Selection via Global Redundancy Minimization , 2015, IEEE Transactions on Knowledge and Data Engineering.

[24]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[25]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[26]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[27]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[28]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Jennifer G. Dy,et al.  From Transformation-Based Dimensionality Reduction to Feature Selection , 2010, ICML.

[30]  Thomas M. Cover,et al.  The Best Two Independent Measurements Are Not the Two Best , 1974, IEEE Trans. Syst. Man Cybern..

[31]  Sankar K. Pal,et al.  Pattern Recognition Algorithms for Data Mining , 2004 .

[32]  Sanghamitra Bandyopadhyay,et al.  Unsupervised feature selection using an improved version of Differential Evolution , 2015, Expert Syst. Appl..

[33]  Mohammad Ali Zare Chahooki,et al.  A Survey on semi-supervised feature selection methods , 2017, Pattern Recognit..

[34]  Mengjie Zhang,et al.  Pareto front feature selection based on artificial bee colony optimization , 2018, Inf. Sci..

[35]  Gary Geunbae Lee,et al.  Information gain and divergence-based feature selection for machine learning-based text categorization , 2006, Inf. Process. Manag..

[36]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[37]  Alexander Mehler Large Text Networks as an Object of Corpus Linguistic Studies , 2009 .

[38]  Michela Antonelli,et al.  On the influence of feature selection in fuzzy rule-based regression model generation , 2016, Inf. Sci..

[39]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[40]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[41]  R. Lowry,et al.  Concepts and Applications of Inferential Statistics , 2014 .

[42]  Ujjwal Maulik,et al.  Variable Weighted Maximal Relevance Minimal Redundancy Criterion for Feature Selection Using Normalized Mutual Information , 2015, J. Multiple Valued Log. Soft Comput..