Unsupervised feature selection via low-rank approximation and structure learning

Feature selection is an important research topic in machine learning and computer vision in that it can reduce the dimensionality of input data and improve the performance of learning algorithms. Low-rank approximation techniques can well exploit the low-rank property of input data, which coincides with the internal consistency of dimensionality reduction. In this paper, we propose an efficient unsupervised feature selection algorithm, which incorporates low-rank approximation as well as structure learning. First, using the self-representation of data matrix, we formalize the feature selection problem as a matrix factorization with low-rank constraints. This matrix factorization formulation also embeds structure learning regularization as well as a sparse regularized term. Second, we present an effective technique to approximate low-rank constraints and propose a convergent algorithm in a batch mode. This technique can serve as an algorithmic framework for general low-rank recovery problems as well. Finally, the proposed algorithm is validated in twelve publicly available datasets from machine learning repository. Extensive experimental results demonstrate that the proposed method is capable to achieve competitive performance compared to existing state-of-the-art feature selection methods in terms of clustering performance.

[1]  Bin Fang,et al.  Large Margin Subspace Learning for feature selection , 2013, Pattern Recognit..

[2]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[3]  Lei Wang,et al.  Global and Local Structure Preservation for Feature Selection , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Ying Wu,et al.  A unified approach to salient object detection via low rank matrix recovery , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Yalda Mohsenzadeh,et al.  The Relevance Sample-Feature Machine: A Sparse Bayesian Learning Approach to Joint Feature-Sample Selection , 2013, IEEE Transactions on Cybernetics.

[6]  Guang Yang,et al.  L 1 Graph Based on Sparse Coding for Feature Selection , 2013, ISNN.

[7]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  O. N. Garcia,et al.  Knowledge and Data Engineering: An Outlook , 1989 .

[9]  Simon C. K. Shiu,et al.  Unsupervised feature selection by regularized self-representation , 2015, Pattern Recognit..

[10]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[12]  Witold Pedrycz,et al.  Unsupervised feature selection via maximum projection and minimum redundancy , 2015, Knowl. Based Syst..

[13]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[14]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Lei Wang,et al.  On Similarity Preserving Feature Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Huan Liu,et al.  Semi-supervised Feature Selection via Spectral Analysis , 2007, SDM.

[17]  Witold Pedrycz,et al.  Subspace learning for unsupervised feature selection via matrix factorization , 2015, Pattern Recognit..

[18]  John Wright,et al.  Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization , 2009, NIPS.

[19]  Feiping Nie,et al.  Social trust prediction using heterogeneous networks , 2013, TKDD.

[20]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[21]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[22]  Xuelong Li,et al.  Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection , 2014, IEEE Transactions on Cybernetics.

[23]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[24]  Jian Yang,et al.  Sparse discriminative feature selection , 2015, Pattern Recognit..

[25]  Abhimanyu Das,et al.  Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection , 2011, ICML.

[26]  William Zhu,et al.  Sparse Graph Embedding Unsupervised Feature Selection , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[27]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[28]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[29]  I. Jolliffe Principal Component Analysis , 2002 .

[30]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations: II. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Josiane Mothe,et al.  Nonconvex Regularizations for Feature Selection in Ranking With Sparse SVM , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Jiawei Han,et al.  Locality Preserving Feature Learning , 2012, AISTATS.

[33]  ChengXiang Zhai,et al.  Robust Unsupervised Feature Selection , 2013, IJCAI.

[34]  Wenzhong Guo,et al.  Sparse Multigraph Embedding for Multimodal Feature Representation , 2017, IEEE Transactions on Multimedia.

[35]  Ivor W. Tsang,et al.  Minimax Sparse Logistic Regression for Very High-Dimensional Feature Selection , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[37]  Feiping Nie,et al.  Robust Manifold Nonnegative Matrix Factorization , 2014, ACM Trans. Knowl. Discov. Data.

[38]  Yiu-ming Cheung,et al.  Feature Selection and Kernel Learning for Local Learning-Based Clustering , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Thomas S. Huang,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[40]  Jeff A. Bilmes,et al.  A Submodular-supermodular Procedure with Applications to Discriminative Structure Learning , 2005, UAI.

[41]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[42]  Sang Uk Lee,et al.  Integrated Position Estimation Using Aerial Image Sequences , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[44]  Zenglin Xu,et al.  Discriminative Semi-Supervised Feature Selection Via Manifold Regularization , 2009, IEEE Transactions on Neural Networks.

[45]  Sinisa Todorovic,et al.  Local-Learning-Based Feature Selection for High-Dimensional Data Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Jing Liu,et al.  Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[47]  Feiping Nie,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Feature Selection via Joint Embedding Learning and Sparse Regression , 2022 .