Global and local structure preserving sparse subspace learning: An iterative approach to unsupervised feature selection

As we aim at alleviating the curse of high-dimensionality, subspace learning is becoming more popular. Existing approaches use either information about global or local structure of the data, and few studies simultaneously focus on global and local structures as the both of them contain important information. In this paper, we propose a global and local structure preserving sparse subspace learning (GLoSS) model for unsupervised feature selection. The model can simultaneously realize feature selection and subspace learning. In addition, we develop a greedy algorithm to establish a generic combinatorial model, and an iterative strategy based on an accelerated block coordinate descent is used to solve the GLoSS problem. We also provide whole iterate sequence convergence analysis of the proposed iterative algorithm. Extensive experiments are conducted on real-world datasets to show the superiority of the proposed approach over several state-of-the-art unsupervised feature selection approaches. HighlightsWe propose a novel unsupervised subspace learning model for unsupervised learning.We propose a greedy algorithm to solve a combinatorial model.We propose an iterative algorithm to solve the relaxed continuous model.We establish a whole iterate sequence convergence result of the iterative algorithm.We conduct extensive experimental studies about the proposed algorithms.

[1]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[2]  Xuan Li,et al.  Local and Global Discriminative Learning for Unsupervised Feature Selection , 2013, 2013 IEEE 13th International Conference on Data Mining.

[3]  Hongbin Li,et al.  Pattern-Coupled Sparse Bayesian Learning for Recovery of Block-Sparse Signals , 2013, IEEE Transactions on Signal Processing.

[4]  Stan Lipovetsky,et al.  PCA and SVD with nonnegative loadings , 2009, Pattern Recognit..

[5]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[6]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[7]  F. Al-Shamali,et al.  Author Biographies. , 2015, Journal of social work in disability & rehabilitation.

[8]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[9]  Lei Wang,et al.  Global and Local Structure Preservation for Feature Selection , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Dacheng Tao,et al.  On the Performance of Manhattan Nonnegative Matrix Factorization , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Erkki Oja,et al.  Projective Nonnegative Matrix Factorization for Image Compression and Feature Extraction , 2005, SCIA.

[12]  Simon C. K. Shiu,et al.  Unsupervised feature selection by regularized self-representation , 2015, Pattern Recognit..

[13]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[14]  Yangyang Xu,et al.  On Higher-order Singular Value Decomposition from Incomplete Data , 2014, ArXiv.

[15]  Xiaoyang Tan,et al.  Pattern Recognition , 2016, Communications in Computer and Information Science.

[16]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[17]  Qinghua Hu,et al.  A linear subspace learning approach via sparse coding , 2011, 2011 International Conference on Computer Vision.

[18]  Witold Pedrycz,et al.  Subspace learning for unsupervised feature selection via matrix factorization , 2015, Pattern Recognit..

[19]  Shai Avidan,et al.  Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms , 2005, NIPS.

[20]  Jun Fang,et al.  Bayesian Compressive Sensing Using Normal Product Priors , 2015, IEEE Signal Processing Letters.

[21]  Hong Cheng,et al.  Sparsity-Induced Similarity Measure and Its Applications , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Jian Yang,et al.  Sparse discriminative feature selection , 2015, Pattern Recognit..

[23]  Michael J. Black,et al.  A Framework for Robust Subspace Learning , 2003, International Journal of Computer Vision.

[24]  Jian Yang,et al.  Essence of kernel Fisher discriminant: KPCA plus LDA , 2004, Pattern Recognit..

[25]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[26]  Shai Avidan,et al.  Generalized spectral bounds for sparse LDA , 2006, ICML.

[27]  D. Tao,et al.  On the robustness and generalization of Cauchy regression , 2014, 2014 4th IEEE International Conference on Information Science and Technology.

[28]  Adrian S. Lewis,et al.  The [barred L]ojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems , 2006, SIAM J. Optim..

[29]  ChengXiang Zhai,et al.  Robust Unsupervised Feature Selection , 2013, IJCAI.

[30]  Yang Wang,et al.  Mutual information-based method for selecting informative feature sets , 2013, Pattern Recognit..

[31]  Gene H. Golub,et al.  Matrix computations , 1983 .

[32]  Jiawei Han,et al.  Spectral Regression: A Unified Approach for Sparse Subspace Learning , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[33]  Wotao Yin,et al.  A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update , 2014, J. Sci. Comput..

[34]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[35]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[36]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[37]  Hong Cheng,et al.  Sparsity induced similarity measure for label propagation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[38]  Lei Wang,et al.  On Similarity Preserving Feature Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[39]  Yangyang Xu,et al.  Alternating proximal gradient method for sparse nonnegative Tucker decomposition , 2013, Mathematical Programming Computation.

[40]  Xiaoyan Wang,et al.  Regularized orthogonal linear discriminant analysis , 2012, Pattern Recognit..

[41]  Hong Cheng,et al.  A robust elastic net approach for feature learning , 2014, J. Vis. Commun. Image Represent..

[42]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[43]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[44]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[45]  Yangyang Xu,et al.  On the convergence of higher-order orthogonal iteration , 2015 .

[46]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[47]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[48]  John Shawe-Taylor,et al.  MahNMF: Manhattan Non-negative Matrix Factorization , 2012, ArXiv.

[49]  Xudong Jiang,et al.  Linear Subspace Learning-Based Dimensionality Reduction , 2011, IEEE Signal Processing Magazine.

[50]  BolteJérôme,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems , 2010 .

[51]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[52]  Christos Boutsidis,et al.  Randomized Dimensionality Reduction for $k$ -Means Clustering , 2011, IEEE Transactions on Information Theory.

[53]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[54]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[55]  Jiawei Han,et al.  Spectral Regression for Efficient Regularized Subspace Learning , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[56]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[57]  Limei Zhang,et al.  Graph-optimized locality preserving projections , 2010, Pattern Recognit..

[58]  Jiawei Han,et al.  Joint Feature Selection and Subspace Learning , 2011, IJCAI.

[59]  Dacheng Tao,et al.  Large-margin Weakly Supervised Dimensionality Reduction , 2014, ICML.