An Efficient Greedy Method for Unsupervised Feature Selection

In data mining applications, data instances are typically described by a huge number of features. Most of these features are irrelevant or redundant, which negatively affects the efficiency and effectiveness of different learning algorithms. The selection of relevant features is a crucial task which can be used to allow a better understanding of data or improve the performance of other learning tasks. Although the selection of relevant features has been extensively studied in supervised learning, feature selection with the absence of class labels is still a challenging task. This paper proposes a novel method for unsupervised feature selection, which efficiently selects features in a greedy manner. The paper first defines an effective criterion for unsupervised feature selection which measures the reconstruction error of the data matrix based on the selected subset of features. The paper then presents a novel algorithm for greedily minimizing the reconstruction error based on the features selected so far. The greedy algorithm is based on an efficient recursive formula for calculating the reconstruction error. Experiments on real data sets demonstrate the effectiveness of the proposed algorithm in comparison to the state-of-the-art methods for unsupervised feature selection.

[1]  Jennifer G. Dy Orthogonal Principal Feature Selection , 2008 .

[2]  Ying Cui,et al.  Convex Principal Feature Selection , 2010, SDM.

[3]  Lior Wolf,et al.  Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weighted-based approach , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[5]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[6]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[7]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[8]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[9]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[10]  H. Luetkepohl The Handbook of Matrices , 1996 .

[11]  Christos Boutsidis,et al.  Unsupervised Feature Selection for the $k$-means Clustering Problem , 2009, NIPS.

[12]  Christos Boutsidis,et al.  Unsupervised feature selection for principal components analysis , 2008, KDD.

[13]  Qi Tian,et al.  Feature selection using principal feature analysis , 2007, ACM Multimedia.

[14]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[16]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[17]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.