Integration of dense subgraph finding with feature clustering for unsupervised feature selection

In this article a dense subgraph finding approach is adopted for the unsupervised feature selection problem. The feature set of a data is mapped to a graph representation with individual features constituting the vertex set and inter-feature mutual information denoting the edge weights. Feature selection is performed in a two-phase approach where the densest subgraph is first obtained so that the features are maximally non-redundant among each other. Finally, in the second stage, feature clustering around the non-redundant features is performed to produce the reduced feature set. An approximation algorithm is used for the densest subgraph finding. Empirically, the proposed approach is found to be competitive with several state of art unsupervised feature selection algorithms.

[1]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[3]  Dana Z. Anderson Neural information processing systems : Denver, Co, 1987 , 1988 .

[4]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  E.R. Dougherty,et al.  Feature selection in the classification of high-dimension data , 2008, 2008 IEEE International Workshop on Genomic Signal Processing and Statistics.

[7]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[8]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[9]  Anirban Mukhopadhyay,et al.  Unsupervised Non-redundant Feature Selection: A Graph-Theoretic Approach , 2013 .

[10]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[11]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[12]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[13]  Shutao Li,et al.  Gene Selection Using Wilcoxon Rank Sum Test and Support Vector Machine for Cancer Classification , 2007, CIS.

[14]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[15]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[16]  Xin Jin,et al.  Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles , 2006, BioDM.

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  Wlodzislaw Duch,et al.  Feature Selection for High-Dimensional Data - A Pearson Redundancy Based Filter , 2008, Computer Recognition Systems 2.

[19]  Changsheng Xu,et al.  Size Adaptive Selection of Most Informative Features , 2011, AAAI.

[20]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Young-Koo Lee,et al.  An Improved Maximum Relevance and Minimum Redundancy Feature Selection Algorithm Based on Normalized Mutual Information , 2010, 2010 10th IEEE/IPSJ International Symposium on Applications and the Internet.

[22]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[23]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[24]  Sergei Vassilvitskii,et al.  Densest Subgraph in Streaming and MapReduce , 2012, Proc. VLDB Endow..

[25]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[27]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[28]  U. Fayyad Knowledge Discovery and Data Mining: An Overview , 1995 .

[29]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.