Global and Local Structure Preservation for Feature Selection

The recent literature indicates that preserving global pairwise sample similarity is of great importance for feature selection and that many existing selection criteria essentially work in this way. In this paper, we argue that besides global pairwise sample similarity, the local geometric structure of data is also critical and that these two factors play different roles in different learning scenarios. In order to show this, we propose a global and local structure preservation framework for feature selection (GLSPFS) which integrates both global pairwise sample similarity and local geometric data structure to conduct feature selection. To demonstrate the generality of our framework, we employ methods that are well known in the literature to model the local geometric data structure and develop three specific GLSPFS-based feature selection algorithms. Also, we develop an efficient optimization algorithm with proven global convergence to solve the resulting feature selection problem. A comprehensive experimental study is then conducted in order to compare our feature selection algorithms with many state-of-the-art ones in supervised, unsupervised, and semisupervised learning scenarios. The result indicates that: 1) our framework consistently achieves statistically significant improvement in selection performance when compared with the currently used algorithms; 2) in supervised and semisupervised learning scenarios, preserving global pairwise similarity is more important than preserving local geometric data structure; 3) in the unsupervised scenario, preserving local geometric data structure becomes clearly more important; and 4) the best feature selection performance is always obtained when the two factors are appropriately integrated. In summary, this paper not only validates the advantages of the proposed GLSPFS framework but also gains more insight into the information to be preserved in different feature selection tasks.

[1]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[3]  Zhengming Ma,et al.  Local Coordinates Alignment With Global Preservation for Dimensionality Reduction , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Jiawei Han,et al.  Joint Feature Selection and Subspace Learning , 2011, IJCAI.

[5]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[6]  Ivor W. Tsang,et al.  Discovering Support and Affiliated Features from Very High Dimensions , 2012, ICML.

[7]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[8]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[9]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[10]  Yongdai Kim,et al.  Gradient LASSO for feature selection , 2004, ICML.

[11]  Jian Yang,et al.  Globally Maximizing, Locally Minimizing: Unsupervised Discriminant Projection with Applications to Face and Palm Biometrics , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Feiping Nie,et al.  Feature Selection via Joint Embedding Learning and Sparse Regression , 2011, IJCAI.

[13]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[14]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[15]  Honggang Zhang,et al.  Comments on "Globally Maximizing, Locally Minimizing: Unsupervised Discriminant Projection with Application to Face and Palm Biometrics" , 2007, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Hongyuan Zha,et al.  Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment , 2002, ArXiv.

[17]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2007 .

[18]  Charu C. Aggarwal A Framework for Local Supervised Dimensionality Reduction of High Dimensional Data , 2006, SDM.

[19]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[20]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Jieping Ye,et al.  Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Gaël Richard,et al.  Multiclass Feature Selection With Kernel Gram-Matrix-Based Criteria , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Lei Wang,et al.  Feature Selection with Kernel Class Separability , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[25]  Eric P. Xing,et al.  Feature Selection via Block-Regularized Regression , 2008, UAI.

[26]  Jian-Bo Yang,et al.  Feature Selection Using Probabilistic Prediction of Support Vector Regression , 2011, IEEE Transactions on Neural Networks.

[27]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[28]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[29]  Huan Liu,et al.  Semi-supervised Feature Selection via Spectral Analysis , 2007, SDM.

[30]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[31]  Chidchanok Lursinsap,et al.  A Discrimination Analysis for Unsupervised Feature Selection via Optic Diffraction Principle , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[33]  Lei Wang,et al.  Efficient Spectral Feature Selection with Minimum Redundancy , 2010, AAAI.

[34]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Jidong Zhao,et al.  Locality sensitive semi-supervised feature selection , 2008, Neurocomputing.

[36]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[37]  Kilian Q. Weinberger,et al.  An Introduction to Nonlinear Dimensionality Reduction by Maximum Variance Unfolding , 2006, AAAI.

[38]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[39]  Deli Zhao,et al.  Linear local tangent space alignment and application to face recognition , 2007, Neurocomputing.

[40]  Feiping Nie,et al.  Trace Ratio Criterion for Feature Selection , 2008, AAAI.

[41]  Sinisa Todorovic,et al.  Local-Learning-Based Feature Selection for High-Dimensional Data Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[43]  Zenglin Xu,et al.  Discriminative Semi-Supervised Feature Selection Via Manifold Regularization , 2009, IEEE Transactions on Neural Networks.

[44]  Feiping Nie,et al.  Discriminative Least Squares Regression for Multiclass Classification and Feature Selection , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[45]  Masashi Sugiyama,et al.  Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis , 2007, J. Mach. Learn. Res..

[46]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[47]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[48]  Lei Wang,et al.  On Similarity Preserving Feature Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[49]  Lei Wang,et al.  Feature Selection With Redundancy-Constrained Class Separability , 2010, IEEE Transactions on Neural Networks.

[50]  Hongtao Lu,et al.  Efficient linear discriminant analysis with locality preserving for face recognition , 2012, Pattern Recognit..

[51]  Haixian Wang,et al.  Locality-Preserved Maximum Information Projection , 2008, IEEE Transactions on Neural Networks.

[52]  Yi Jiang,et al.  Eigenvalue Sensitive Feature Selection , 2011, ICML.

[53]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[54]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[55]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..