Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection

Feature selection has aroused considerable research interests during the last few decades. Traditional learning-based feature selection methods separate embedding learning and feature ranking. In this paper, we propose a novel unsupervised feature selection framework, termed as the joint embedding learning and sparse regression (JELSR), in which the embedding learning and sparse regression are jointly performed. Specifically, the proposed JELSR joins embedding learning with sparse regression to perform feature selection. To show the effectiveness of the proposed framework, we also provide a method using the weight via local linear approximation and adding the ℓ2,1-norm regularization, and design an effective algorithm to solve the corresponding optimization problem. Furthermore, we also conduct some insightful discussion on the proposed feature selection approach, including the convergence analysis, computational complexity, and parameter determination. In all, the proposed framework not only provides a new perspective to view traditional methods but also evokes some other deep researches for feature selection. Compared with traditional unsupervised feature selection methods, our approach could integrate the merits of embedding learning and sparse regression. Promising experimental results on different kinds of data sets, including image, voice data and biological data, have validated the effectiveness of our proposed algorithm.

[1]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[2]  Feiping Nie,et al.  Nonlinear Dimensionality Reduction with Local Spline Embedding , 2009, IEEE Transactions on Knowledge and Data Engineering.

[3]  Ying Cui,et al.  Convex Principal Feature Selection , 2010, SDM.

[4]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[6]  Xuelong Li,et al.  Effective Feature Extraction in High-Dimensional Space , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[8]  Xuelong Li,et al.  Semisupervised Dimensionality Reduction and Classification Through Virtual Label Regression , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Xuelong Li,et al.  Exploiting Local Coherent Patterns for Unsupervised Feature Ranking , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Lior Wolf,et al.  Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weighted-based approach , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Shu Yang,et al.  Bilinear Analysis for Kernel Selection and Nonlinear Feature Extraction , 2007, IEEE Transactions on Neural Networks.

[12]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[13]  W. Krzanowski Selection of Variables to Preserve Multivariate Data Structure, Using Principal Components , 1987 .

[14]  Xuelong Li,et al.  Supervised Gaussian Process Latent Variable Model for Dimensionality Reduction , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Aristidis Likas,et al.  Bayesian feature and model selection for Gaussian mixture models , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[17]  Jiawei Han,et al.  Spectral Regression for Efficient Regularized Subspace Learning , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Christos Boutsidis,et al.  Unsupervised feature selection for principal components analysis , 2008, KDD.

[19]  Jennifer G. Dy,et al.  From Transformation-Based Dimensionality Reduction to Feature Selection , 2010, ICML.

[20]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[21]  Xuelong Li,et al.  Discriminant Locally Linear Embedding With High-Order Tensor Data , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  Qi Tian,et al.  Feature selection using principal feature analysis , 2007, ACM Multimedia.

[24]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[25]  Qiang Shen,et al.  New Approaches to Fuzzy-Rough Feature Selection , 2009, IEEE Transactions on Fuzzy Systems.

[26]  Yi Wu,et al.  Stable local dimensionality reduction approaches , 2009, Pattern Recognit..

[27]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[28]  Feiping Nie,et al.  Trace Ratio Criterion for Feature Selection , 2008, AAAI.

[29]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[30]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[31]  Lei Wang,et al.  Efficient Spectral Feature Selection with Minimum Redundancy , 2010, AAAI.

[32]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2006, IEEE Transactions on Knowledge and Data Engineering.

[33]  Volker Roth,et al.  Feature Selection in Clustering Problems , 2003, NIPS.

[34]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[35]  Xuelong Li,et al.  Initialization Independent Clustering With Actively Self-Training Method , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[36]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[38]  Richard Jensen,et al.  Measures for Unsupervised Fuzzy-Rough Feature Selection , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[39]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[40]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[41]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[42]  张振跃,et al.  Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment , 2004 .

[43]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[44]  Ivor W. Tsang,et al.  Flexible Manifold Embedding: A Framework for Semi-Supervised and Unsupervised Dimension Reduction , 2010, IEEE Transactions on Image Processing.

[45]  Feiping Nie,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Feature Selection via Joint Embedding Learning and Sparse Regression , 2022 .