Feature selection with SVD entropy: Some modification and extension

Many approaches have been developed for dimensionality reduction. These approaches can broadly be categorized into supervised and unsupervised methods. In case of supervised dimensionality reduction, for any input vector the target value is known, which can be a class label also. In a supervised approach, our objective is to select a subset of features that has adequate discriminating power to predict the target value. This target value for an input vector is absent in case of an unsupervised approach. In an unsupervised scheme, we mainly try to find a subset that can capture the inherent ''structure'' of the data, such as the neighborhood relation or the cluster structure. In this work, we first study a Singular Value Decomposition (SVD) based unsupervised feature selection approach proposed by Varshavsky et al. Then we propose a modification of this method to improve its performance. An SVD-entropy based supervised feature selection algorithm is also developed in this paper. Performance evaluation of the algorithms is done on altogether 13 benchmark and one Synthetic data sets. The quality of the selected features is assessed using three indices: Sammon's Error (SE), Cluster Preservation Index (CPI) and MisClassification Error (MCE) using a 1-Nearest Neighbor (1-NN) classifier. Besides showing the improvement of the modified unsupervised scheme over the existing one, we have also made a comparative study of the modified unsupervised and the proposed supervised algorithms with one well-known unsupervised and two popular supervised feature selection methods respectively. Our results reveal the effectiveness of the proposed algorithms in selecting relevant features.

[1]  Ashwin Ram,et al.  Efficient Feature Selection in Conceptual Clustering , 1997, ICML.

[2]  Tomomi Matsui,et al.  An Analysis of Dinkelbach's Algorithm for 0-1 Fractional Programming Problems , 1992 .

[3]  Pedro Larrañaga,et al.  Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  L. Hubert,et al.  Comparing partitions , 1985 .

[5]  I. Stancu-Minasian Nonlinear Fractional Programming , 1997 .

[6]  I. Chung,et al.  Identification of Single- and Multiple-Class Specific Signature Genes from Gene Expression Profiles by Group Marker Index , 2011, PloS one.

[7]  Michal Linial,et al.  Novel Unsupervised Feature Filtering of Biological Data , 2006, ISMB.

[8]  Manoranjan Dash,et al.  Dimensionality reduction of unsupervised data , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[9]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[11]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[12]  N. Pal,et al.  Evolutionary methods for unsupervised feature selection using Sammon’s stress function , 2010 .

[13]  M. Hestenes Inversion of Matrices by Biorthogonalization and Related Results , 1958 .

[14]  Chin-Teng Lin,et al.  Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems , 2008, BMC Bioinformatics.

[15]  Kezhi Mao,et al.  Identifying critical variables of principal components for unsupervised feature selection , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[17]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[18]  Christos Boutsidis,et al.  Unsupervised Feature Selection for the $k$-means Clustering Problem , 2009, NIPS.

[19]  Hongdong Li,et al.  Supervised dimensionality reduction via sequential semidefinite programming , 2008, Pattern Recognit..

[20]  Nikhil R. Pal,et al.  Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering , 2007, BMC Bioinformatics.

[21]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[22]  Lei Wang,et al.  Feature Selection With Redundancy-Constrained Class Separability , 2010, IEEE Transactions on Neural Networks.

[23]  Yuchou Chang,et al.  Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm , 2008, Pattern Recognit..

[24]  Jack Dongarra,et al.  LAPACK Users' Guide, 3rd ed. , 1999 .

[25]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Edward R. Dougherty,et al.  Small Sample Issues for Microarray-Based Classification , 2001, Comparative and functional genomics.

[27]  Colin Studholme,et al.  An overlap invariant entropy measure of 3D medical image alignment , 1999, Pattern Recognit..

[28]  Mark A. Gluck,et al.  Information, Uncertainty and the Utility of Categories , 1985 .

[29]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[30]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[31]  Xin Zhou,et al.  MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data , 2007, Bioinform..

[32]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[33]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[34]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[35]  Luis Talavera,et al.  Dependency-based feature selection for clustering symbolic data , 2000, Intell. Data Anal..

[36]  Werner Dinkelbach On Nonlinear Fractional Programming , 1967 .

[37]  David K. Smith Theory of Linear and Integer Programming , 1987 .

[38]  Nikhil R. Pal A fuzzy rule based approach to identify biomarkers for diagnostic classification of cancers , 2007, 2007 IEEE International Fuzzy Systems Conference.

[39]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[40]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[41]  S. Chatterjee,et al.  Influential Observations, High Leverage Points, and Outliers in Linear Regression , 1986 .

[42]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[43]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.