Sparse PCA via $l_{2, p}$-Norm Regularization for Unsupervised Feature Selection

In the field of data mining, how to deal with high-dimensional data is an inevitable problem. Unsupervised feature selection has attracted more and more attention because it does not rely on labels. The performance of spectral-based unsupervised methods depends on the quality of constructed similarity matrix, which is used to depict the intrinsic structure of data. However, real-world data contain a large number of noise samples and features, making the similarity matrix constructed by original data cannot be completely reliable. Worse still, the size of similarity matrix expands rapidly as the number of samples increases, making the computational cost increase significantly. Inspired by principal component analysis, we propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with `2,p-norm regularization. The projection matrix, which is used for feature selection, is learned by minimizing the reconstruction error under the sparse constraint. Then, we present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically. Finally, extensive experiments on real-world data sets demonstrate the effectiveness of our proposed method.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  Jun Guo,et al.  Dependence Guided Unsupervised Feature Selection , 2018, AAAI.

[3]  Kazuyuki Murase,et al.  A new wrapper feature selection approach using neural network , 2010, Neurocomputing.

[4]  Ming Yang,et al.  Feature Selection Embedded Subspace Clustering , 2016, IEEE Signal Processing Letters.

[5]  Yang Gao,et al.  Robust neighborhood embedding for unsupervised feature selection , 2020, Knowl. Based Syst..

[6]  Huiyu Zhou,et al.  Fuzzy Optimal Energy Management for Fuel Cell and Supercapacitor Systems Using Neural Network Based Driving Pattern Recognition , 2019, IEEE Transactions on Fuzzy Systems.

[7]  M. Dentith,et al.  Petrophysics and mineral exploration: a workflow for data analysis and a new interpretation framework , 2019, Geophysical Prospecting.

[8]  Jie Li,et al.  Unsupervised Semantic-Preserving Adversarial Hashing for Image Search , 2019, IEEE Transactions on Image Processing.

[9]  Gaël Varoquaux,et al.  Recursive Nearest Agglomeration (ReNA): Fast Clustering for Approximation of Structured Signals , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Qingming Huang,et al.  Online Fast Adaptive Low-Rank Similarity Learning for Cross-Modal Retrieval , 2020, IEEE Transactions on Multimedia.

[11]  Weiwei Liu,et al.  Data mining model for multimedia financial time series using information entropy , 2020, J. Intell. Fuzzy Syst..

[12]  Daehan Kwak,et al.  Fuzzy Ontology and LSTM-Based Text Mining: A Transportation Network Monitoring System for Assisting Travel , 2019, Sensors.

[13]  Joydeep Ghosh,et al.  Cluster Ensembles A Knowledge Reuse Framework for Combining Partitionings , 2002, AAAI/IAAI.

[14]  Alireza Entezari,et al.  An Interactive Framework for Visualization of Weather Forecast Ensembles , 2019, IEEE Transactions on Visualization and Computer Graphics.

[15]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[16]  Huan Liu,et al.  Embedded Unsupervised Feature Selection , 2015, AAAI.

[17]  Feiping Nie,et al.  Structured Graph Optimization for Unsupervised Feature Selection , 2019, IEEE Transactions on Knowledge and Data Engineering.

[18]  Koray Kayabol,et al.  Approximate Sparse Multinomial Logistic Regression for Classification , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[21]  Shichao Zhang,et al.  Low-Rank Sparse Subspace for Spectral Clustering , 2019, IEEE Transactions on Knowledge and Data Engineering.

[22]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[23]  Jing Liu,et al.  Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[24]  Ping Luo,et al.  Disease Gene Prediction by Integrating PPI Networks, Clinical RNA-Seq Data and OMIM Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.