Feature Selection via Global Redundancy Minimization

Feature selection has been an important research topic in data mining, because the real data sets often have high-dimensional features, such as the bioinformatics and text mining applications. Many existing filter feature selection methods rank features by optimizing certain feature ranking criterions, such that correlated features often have similar rankings. These correlated features are redundant and don't provide large mutual information to help data mining. Thus, when we select a limited number of features, we hope to select the top non-redundant features such that the useful mutual information can be maximized. In previous research, Ding et al. recognized this important issue and proposed the minimum Redundancy Maximum Relevance Feature Selection (mRMR) model to minimize the redundancy between sequentially selected features. However, this method used the greedy search, thus the global feature redundancy wasn't considered and the results are not optimal. In this paper, we propose a new feature selection framework to globally minimize the feature redundancy with maximizing the given feature ranking scores, which can come from any supervised or unsupervised methods. Our new model has no parameter so that it is especially suitable for practical data mining application. Experimental results on benchmark data sets show that the proposed method consistently improves the feature selection results compared to the original methods. Meanwhile, we introduce a new unsupervised global and local discriminative feature selection method which can be unified with the global feature redundancy minimization framework and shows superior performance.

[1]  Jingbo Xia,et al.  Prediction of thermophilic protein with pseudo amino Acid composition: an approach from combined feature selection and reduction. , 2011, Protein and peptide letters.

[2]  Yi Yang,et al.  A Convex Formulation for Semi-Supervised Multi-Label Feature Selection , 2014, AAAI.

[3]  Shannon L. Risacher,et al.  Structural Brain Network Constrained Neuroimaging Marker Identification for Predicting Cognitive Functions , 2013, IPMI.

[4]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[5]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[6]  Yi Peng,et al.  Feature Selection via l p -Norm Support Vector Machines , 2011 .

[7]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[8]  Shannon L. Risacher,et al.  Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance , 2011, 2011 International Conference on Computer Vision.

[9]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Feiping Nie,et al.  Unsupervised Feature Selection via Unified Trace Ratio Formulation and K-means Clustering (TRACK) , 2014, ECML/PKDD.

[11]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[12]  M. Hestenes Multiplier and gradient methods , 1969 .

[13]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[14]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  M. J. D. Powell,et al.  A method for nonlinear constraints in minimization problems , 1969 .

[16]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[17]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[18]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[19]  George Forman,et al.  Extremely fast text feature extraction for classification and indexing , 2008, CIKM '08.

[20]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[22]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[23]  Shannon L. Risacher,et al.  Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort , 2012, Bioinform..

[24]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[25]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[26]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[27]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[28]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[29]  Xin Jin,et al.  Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles , 2006, BioDM.

[30]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[31]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[32]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[33]  Cordelia Schmid,et al.  Evaluation of GIST descriptors for web-scale image search , 2009, CIVR '09.

[34]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Changshui Zhang,et al.  Efficient Euclidean projections via Piecewise Root Finding and its application in gradient projection , 2011, Neurocomputing.

[36]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[37]  Charles Elkan,et al.  Quadratic Programming Feature Selection , 2010, J. Mach. Learn. Res..

[38]  David G. Stork,et al.  Pattern Classification , 1973 .

[39]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[40]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[41]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[42]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Exact Top-k Feature Selection via ℓ2,0-Norm Constraint , 2022 .