Joint hypergraph learning and sparse regression for feature selection

Abstract In this paper, we propose a unified framework for improved structure estimation and feature selection. Most existing graph-based feature selection methods utilize a static representation of the structure of the available data based on the Laplacian matrix of a simple graph. Here on the other hand, we perform data structure learning and feature selection simultaneously. To improve the estimation of the manifold representing the structure of the selected features, we use a higher order description of the neighborhood structures present in the available data using hypergraph learning. This allows those features which participate in the most significant higher order relations to be selected, and the remainder discarded, through a sparsification process. We formulate a single objective function to capture and regularize the hypergraph weight estimation and feature selection processes. Finally, we present an optimization algorithm to recover the hypergraph weights and a sparse set of feature selection indicators. This process offers a number of advantages. First, by adjusting the hypergraph weights, we preserve high-order neighborhood relations reflected in the original data, which cannot be modeled by a simple graph. Moreover, our objective function captures the global discriminative structure of the features in the data. Comprehensive experiments on 9 benchmark datasets show that our method achieves statistically significant improvement over state-of-art feature selection methods, supporting the effectiveness of the proposed method.

[1]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[2]  Martine D. F. Schlag,et al.  Multi-level spectral hypergraph partitioning with arbitrary vertex sizes , 1996, Proceedings of International Conference on Computer Aided Design.

[3]  Shimon Ullman,et al.  Face Recognition: The Problem of Compensating for Changes in Illumination Direction , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Zenglin Xu,et al.  Discriminative Semi-Supervised Feature Selection Via Manifold Regularization , 2009, IEEE Transactions on Neural Networks.

[5]  Qingshan Liu,et al.  Video object segmentation by hypergraph cut , 2009, CVPR.

[6]  Zi Huang,et al.  Self-taught dimensionality reduction on the high-dimensional small-sized data , 2013, Pattern Recognit..

[7]  Xuelong Li,et al.  Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection , 2014, IEEE Transactions on Cybernetics.

[8]  Jon M. Kleinberg,et al.  Clustering categorical data: an approach based on dynamical systems , 2000, The VLDB Journal.

[9]  Ronen Basri,et al.  Comparing images under variable illumination , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[10]  Jidong Zhao,et al.  Locality sensitive semi-supervised feature selection , 2008, Neurocomputing.

[11]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[12]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[13]  Serge J. Belongie,et al.  Higher order learning with graphs , 2006, ICML.

[14]  P. Sebastiani,et al.  Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer , 2007, Nature Medicine.

[15]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[16]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[19]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[20]  Huan Liu,et al.  Semi-supervised Feature Selection via Spectral Analysis , 2007, SDM.

[21]  Xindong Wu,et al.  Feature Selection by Joint Graph Sparse Coding , 2013, SDM.

[22]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[23]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[24]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[25]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[26]  Feiping Nie,et al.  Efficient semi-supervised feature selection with noise insensitive trace ratio criterion , 2013, Neurocomputing.

[28]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[29]  Jiawei Han,et al.  Joint Feature Selection and Subspace Learning , 2011, IJCAI.

[30]  David J. Kriegman,et al.  What Is the Set of Images of an Object Under All Possible Illumination Conditions? , 1998, International Journal of Computer Vision.

[31]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[32]  Lei Wang,et al.  On Similarity Preserving Feature Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[33]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Yihong Gong,et al.  Unsupervised Image Categorization by Hypergraph Partition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[36]  Pietro Perona,et al.  Beyond pairwise clustering , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[37]  Feiping Nie,et al.  Trace Ratio Criterion for Feature Selection , 2008, AAAI.

[38]  Sinisa Todorovic,et al.  Local-Learning-Based Feature Selection for High-Dimensional Data Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Xiaofeng Zhu,et al.  Video-to-Shot Tag Propagation by Graph Sparse Group Lasso , 2013, IEEE Transactions on Multimedia.

[40]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[41]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[42]  Jieping Ye,et al.  Hypergraph spectral learning for multi-label classification , 2008, KDD.

[43]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[44]  David G. Stork,et al.  Pattern Classification , 1973 .

[45]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.