Reconstruction-based Unsupervised Feature Selection: An Embedded Approach

Feature selection has been proven to be effective and efficient in preparing high-dimensional data for data mining and machine learning problems. Since real-world data is usually unlabeled, unsupervised feature selection has received increasing attention in recent years. Without label information, unsupervised feature selection needs alternative criteria to define feature relevance. Recently, data reconstruction error emerged as a new criterion for unsupervised feature selection, which defines feature relevance as the capability of features to approximate original data via a reconstruction function. Most existing algorithms in this family assume predefined, linear reconstruction functions. However, the reconstruction function should be data dependent and may not always be linear especially when the original data is high-dimensional. In this paper, we investigate how to learn the reconstruction function from the data automatically for unsupervised feature selection, and propose a novel reconstruction-based unsupervised feature selection framework REFS, which embeds the reconstruction function learning process into feature selection. Experiments on various types of real-world datasets demonstrate the effectiveness of the proposed framework REFS.

[1]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Huan Liu,et al.  Feature selection for classification: A review , 2014 .

[3]  Huan Liu,et al.  Unsupervised Streaming Feature Selection in Social Media , 2015, CIKM.

[4]  Ying Cui,et al.  Convex Principal Feature Selection , 2010, SDM.

[5]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[8]  ChengXiang Zhai,et al.  Robust Unsupervised Feature Selection , 2013, IJCAI.

[9]  Huan Liu,et al.  Toward Time-Evolving Feature Selection on Dynamic Networks , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[10]  Christos Boutsidis,et al.  Unsupervised feature selection for principal components analysis , 2008, KDD.

[11]  Philip S. Yu,et al.  Unsupervised Feature Selection by Preserving Stochastic Neighbors , 2016, AISTATS.

[12]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[13]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[14]  Liang Du,et al.  Unsupervised Feature Selection with Adaptive Structure Learning , 2015, KDD.

[15]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[16]  Liu Huan,et al.  Toward Time-Evolving Feature Selection on Dynamic Networks , 2016 .

[17]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[18]  Feiping Nie,et al.  Trace Ratio Criterion for Feature Selection , 2008, AAAI.

[19]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[20]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[21]  Huan Liu,et al.  Challenges of Feature Selection for Big Data Analytics , 2016, IEEE Intelligent Systems.

[22]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[23]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2007 .

[24]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Xiaofei He,et al.  Laplacian Regularized D-Optimal Design for Active Learning and Its Application to Image Retrieval , 2010, IEEE Transactions on Image Processing.

[26]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[27]  Chun Chen,et al.  Active Learning Based on Locally Linear Reconstruction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Mohamed S. Kamel,et al.  An Efficient Greedy Method for Unsupervised Feature Selection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[30]  Huan Liu,et al.  Toward Personalized Relational Learning , 2017, SDM.

[31]  Chun Chen,et al.  G-Optimal Design with Laplacian Regularization , 2010, AAAI.

[32]  Huan Liu,et al.  Multi-Label Informed Feature Selection , 2016, IJCAI.

[33]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[34]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[35]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..