Unsupervised feature selection for multi-cluster data

In many data analysis tasks, one is often confronted with very high dimensional data. Feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering, classification and retrieval. In this paper, we consider the feature selection problem in unsupervised learning scenario, which is particularly difficult due to the absence of class labels that would guide the search for relevant information. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. Traditional unsupervised feature selection methods address this issue by selecting the top ranked features based on certain scores computed independently for each feature. These approaches neglect the possible correlation between different features and thus can not produce an optimal feature subset. Inspired from the recent developments on manifold learning and L1-regularized models for subset selection, we propose in this paper a new approach, called Multi-Cluster Feature Selection (MCFS), for unsupervised feature selection. Specifically, we select those features such that the multi-cluster structure of the data can be best preserved. The corresponding optimization problem can be efficiently solved since it only involves a sparse eigen-problem and a L1-regularized least squares problem. Extensive experimental results over various real-life data sets have demonstrated the superiority of the proposed algorithm.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[3]  Ronald A. Cole,et al.  Spoken Letter Recognition , 1990, HLT.

[4]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[5]  Martine D. F. Schlag,et al.  Spectral K-Way Ratio-Cut Partitioning and Clustering , 1993, 30th ACM/IEEE Design Automation Conference.

[6]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[8]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[9]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[11]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[12]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[13]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[14]  G. Stewart Matrix Algorithms, Volume II: Eigensystems , 2001 .

[15]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[16]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[17]  Lior Wolf,et al.  Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weighted-based approach , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[18]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[19]  Volker Roth,et al.  Feature Selection in Clustering Problems , 2003, NIPS.

[20]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[21]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[22]  Anil K. Jain,et al.  Simultaneous feature selection and clustering using mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[24]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[25]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[26]  Aristidis Likas,et al.  Bayesian feature and model selection for Gaussian mixture models , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[28]  Jiawei Han,et al.  Spectral Regression: A Unified Approach for Sparse Subspace Learning , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[29]  Christos Boutsidis,et al.  Unsupervised feature selection for principal components analysis , 2008, KDD.

[30]  Jiawei Han,et al.  Sparse Projections over Graph , 2008, AAAI.

[31]  Nizar Bouguila,et al.  A Hybrid Feature Extraction Selection Approach for High-Dimensional Non-Gaussian Data Clustering , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Jiawei Han,et al.  Spectral Regression: A Regression Framework for Efficient Regularized Subspace Learning , 2009 .

[33]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.