Joint Feature Selection and Subspace Learning

Dimensionality reduction is a very important topic in machine learning. It can be generally classified into two categories: feature selection and subspace learning. In the past decades, many methods have been proposed for dimensionality reduction. However, most of these works study feature selection and subspace learning independently. In this paper, we present a framework for joint feature selection and subspace learning. We reformulate the subspace learning problem and use L2,1-norm on the projection matrix to achieve row-sparsity, which leads to selecting relevant features and learning transformation simultaneously. We discuss two situations of the proposed framework, and present their optimization algorithms. Experiments on benchmark face recognition data sets illustrate that the proposed framework outperforms the state of the art methods overwhelmingly.

[1]  P. N. Bellhumer Eigenfaces vs. fisherfaces : Recognition using class specific linear projection , 1997 .

[2]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression Database , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  ©1981 American Mathematical Society 0002-9904/81 /0000-0121 /$04.00 , 2022 .

[4]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[5]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[6]  Jiawei Han,et al.  Spectral Regression: A Unified Approach for Sparse Subspace Learning , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[7]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[9]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[10]  Shai Avidan,et al.  Generalized spectral bounds for sparse LDA , 2006, ICML.

[11]  Jennifer G. Dy,et al.  From Transformation-Based Dimensionality Reduction to Feature Selection , 2010, ICML.

[12]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[13]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[14]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Taylor Francis Online,et al.  Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America. , 1992 .

[16]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[17]  Christine Cozzens Hayes,et al.  Illumination , 2021, Encyclopedia of the UN Sustainable Development Goals.

[18]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[19]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[20]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[21]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[22]  Thomas G. Dietterich,et al.  In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.

[23]  Jieping Ye,et al.  Efficient Recovery of Jointly Sparse Vectors , 2009, NIPS.

[24]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[25]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.