Sparse approximation to discriminant projection learning and application to image classification

Abstract Subspace learning for dimensionality reduction is an important topic in pattern analysis and machine learning, and it has extensive applications in feature representation and image classification. Linear discriminant analysis (LDA) is a well-known subspace learning approach for supervised dimensionality reduction due to its effectiveness and efficacy in discriminant analysis. However, LDA is not stable and suffers from the singularity problem when addressing small sample size and high-dimensional data. In this paper, we develop a novel subspace learning model, named sparse approximation to discriminant projection learning (SADPL), to learn the sparse projection matrix. Different from the traditional LDA-based methods, we learn the projection matrix based on a new objective function rather than the Fisher criterion, which avoids the matrix singularity problem. In order to distinguish which features play an important role in discriminant analysis, we embed a feature selection framework to the subspace learning model to select the informative features. Finally, we can attain a convex objective function which can be solved by an effective optimization algorithm, and theoretically prove the convergence of the proposed optimization algorithm. Extensive experiments on all sorts of image classification tasks, such as face recognition, palmprint recognition, object categorization and texture classification show that our SADPL achieves competitive performance compared to the state-of-the-art methods.

[1]  Witold Pedrycz,et al.  Global and local structure preserving sparse subspace learning: An iterative approach to unsupervised feature selection , 2015, Pattern Recognit..

[2]  Jian Yang,et al.  Rotational Invariant Dimensionality Reduction Algorithms , 2017, IEEE Transactions on Cybernetics.

[3]  Yun Fu,et al.  Learning Robust and Discriminative Subspace With Low-Rank Constraints , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Patrick J. Flynn,et al.  Overview of the face recognition grand challenge , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  R. Fisher THE STATISTICAL UTILIZATION OF MULTIPLE MEASUREMENTS , 1938 .

[6]  Tao Jiang,et al.  Efficient and robust feature extraction by maximum margin criterion , 2003, IEEE Transactions on Neural Networks.

[7]  Jieping Ye,et al.  Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems , 2005, J. Mach. Learn. Res..

[8]  Xuesong Lu,et al.  Fisher Discriminant Analysis With L1-Norm , 2014, IEEE Transactions on Cybernetics.

[9]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[10]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[11]  Kun Zhou,et al.  Locality Sensitive Discriminant Analysis , 2007, IJCAI.

[12]  David Zhang,et al.  Local Linear Discriminant Analysis Framework Using Sample Neighbors , 2011, IEEE Transactions on Neural Networks.

[13]  Zhonglong Zheng Sparse Locality Preserving Embedding , 2009, 2009 2nd International Congress on Image and Signal Processing.

[14]  Jiashu Zhang,et al.  Linear Discriminant Analysis Based on L1-Norm Maximization , 2013, IEEE Transactions on Image Processing.

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[17]  Dao-Qing Dai,et al.  Inverse Fisher discriminate criteria for small sample size problem and its application to face recognition , 2005, Pattern Recognit..

[18]  Chih-Fong Tsai,et al.  Keypoint selection for efficient bag-of-words feature generation and effective image classification , 2016, Inf. Sci..

[19]  Yi Yang,et al.  Semisupervised Feature Selection via Spline Regression for Video Semantic Recognition , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Jian-Huang Lai,et al.  Perturbation LDA: Learning the difference between the class empirical mean and its expectation , 2009, Pattern Recognit..

[21]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Pong C. Yuen,et al.  Face Recognition by Regularized Discriminant Analysis , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[24]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[25]  Feiping Nie,et al.  A New Formulation of Linear Discriminant Analysis for Robust Dimensionality Reduction , 2019, IEEE Transactions on Knowledge and Data Engineering.

[26]  Hiroshi Mamitsuka,et al.  Discriminative Graph Embedding for Label Propagation , 2011, IEEE Transactions on Neural Networks.

[27]  Jiawei Han,et al.  Semi-supervised Discriminant Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  Jiawei Han,et al.  SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[29]  Huan Liu,et al.  An Unsupervised Feature Selection Framework for Social Media Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[30]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[31]  Zhenhua Guo,et al.  Face recognition by sparse discriminant analysis via joint L2, 1-norm minimization , 2014, Pattern Recognit..

[32]  C. R. Rao,et al.  The Utilization of Multiple Measurements in Problems of Biological Classification , 1948 .

[33]  Tieniu Tan,et al.  l2, 1 Regularized correntropy for robust feature selection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Chris H. Q. Ding,et al.  R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization , 2006, ICML.

[35]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[36]  Jiawei Han,et al.  Towards feature selection in network , 2011, CIKM '11.

[37]  Barbara Caputo,et al.  Class-Specific Material Categorisation , 2005, ICCV.

[38]  Xiaoyang Tan,et al.  Pattern Recognition , 2016, Communications in Computer and Information Science.

[39]  Jieping Ye,et al.  Feature Reduction via Generalized Uncorrelated Linear Discriminant Analysis , 2006, IEEE Transactions on Knowledge and Data Engineering.

[40]  Lei Wang,et al.  Global and Local Structure Preservation for Feature Selection , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[41]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[42]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[43]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[44]  David Zhang,et al.  On the Dimensionality Reduction for Sparse Representation Based Face Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[45]  Dao-Qing Dai,et al.  Two-Dimensional Maximum Margin Feature Extraction for Face Recognition , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[46]  Martin T. Wells,et al.  Simultaneous Sparse Estimation of Canonical Vectors in the p ≫ N Setting , 2014, 1403.6095.

[47]  Tieniu Tan,et al.  Feature Selection Based on Structured Sparsity: A Comprehensive Study , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[48]  Hong Yan,et al.  Sample Weighting: An Inherent Approach for Outlier Suppressing Discriminant Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[49]  Nojun Kwak,et al.  Principal Component Analysis Based on L1-Norm Maximization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  David Zhang,et al.  Sequential row-column independent component analysis for face recognition , 2009, Neurocomputing.

[51]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[52]  Masashi Sugiyama,et al.  Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis , 2007, J. Mach. Learn. Res..

[53]  Jianhua Z. Huang,et al.  Sparse Linear Discriminant Analysis with Applications to High Dimensional Low Sample Size Data , 2009 .

[54]  David Zhang,et al.  Sparse Representation Based Fisher Discrimination Dictionary Learning for Image Classification , 2014, International Journal of Computer Vision.

[55]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[56]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[57]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[58]  Jian Yang,et al.  Sparse Approximation to the Eigensubspace for Discrimination , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[59]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.