Semi-supervised feature selection via hierarchical regression for web image classification

Abstract Feature selection is an important step for large-scale image data analysis, which has been proved to be difficult due to large size in both dimensions and samples. Feature selection firstly eliminates redundant and irrelevant features and then chooses a subset of features that performs as efficient as the complete set. Generally, supervised feature selection yields better performance than unsupervised feature selection because of the utilization of labeled information. However, labeled data samples are always expensive to obtain, which constraints the performance of supervised feature selection, especially for the large web image datasets. In this paper, we propose a semi-supervised feature selection algorithm that is based on a hierarchical regression model. Our contribution can be highlighted as: (1) Our algorithm utilizes a statistical approach to exploit both labeled and unlabeled data, which preserves the manifold structure of each feature type. (2) The predicted label matrix of the training data and the feature selection matrix are learned simultaneously, making the two aspects mutually benefited. Extensive experiments are performed on three large-scale image datasets. Experimental results demonstrate the better performance of our algorithm, compared with the state-of-the-art algorithms.

[1]  Tao Mei,et al.  Graph-based semi-supervised learning with multiple labels , 2009, J. Vis. Commun. Image Represent..

[2]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[3]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[4]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[5]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[6]  Meng Wang,et al.  MSRA-MM 2.0: A Large-Scale Web Multimedia Dataset , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[7]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[8]  Huan Liu,et al.  Semi-supervised Feature Selection via Spectral Analysis , 2007, SDM.

[9]  Shuicheng Yan,et al.  Near-duplicate keyframe retrieval by semi-supervised learning and nonrigid image matching , 2011, TOMCCAP.

[10]  Zi Huang,et al.  Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis , 2013, IEEE Transactions on Multimedia.

[11]  Meng Wang,et al.  Visual query suggestion , 2009, ACM Multimedia.

[12]  Yi Yang,et al.  Co-Regularized Ensemble for Feature Selection , 2013, IJCAI.

[13]  Yi Yang,et al.  Interactive Video Indexing With Statistical Active Learning , 2012, IEEE Transactions on Multimedia.

[14]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[15]  Shuicheng Yan,et al.  Robust Image Analysis With Sparse Representation on Quantized Visual Features , 2013, IEEE Transactions on Image Processing.

[16]  Yi Yang,et al.  Ranking with local regression and global alignment for cross media retrieval , 2009, ACM Multimedia.

[17]  David G. Stork,et al.  Pattern Classification , 1973 .

[18]  Meng Wang,et al.  Visual query suggestion , 2010, ACM Trans. Multim. Comput. Commun. Appl..

[19]  Changsheng Xu,et al.  Inductive Robust Principal Component Analysis , 2012, IEEE Transactions on Image Processing.

[20]  Yueting Zhuang,et al.  Heterogeneous feature selection by group lasso with logistic regression , 2010, ACM Multimedia.

[21]  Changsheng Xu,et al.  A Generic Framework for Video Annotation via Semi-Supervised Learning , 2012, IEEE Transactions on Multimedia.

[22]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[23]  Nicu Sebe,et al.  Exploiting the entire feature space with sparsity for automatic image annotation , 2011, ACM Multimedia.

[24]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[25]  Zenglin Xu,et al.  Discriminative Semi-Supervised Feature Selection Via Manifold Regularization , 2009, IEEE Transactions on Neural Networks.

[26]  Lei Wang,et al.  Efficient Spectral Feature Selection with Minimum Redundancy , 2010, AAAI.

[27]  Ivor W. Tsang,et al.  Flexible Manifold Embedding: A Framework for Semi-Supervised and Unsupervised Dimension Reduction , 2010, IEEE Transactions on Image Processing.

[28]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[29]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Jieping Ye,et al.  Efficient Recovery of Jointly Sparse Vectors , 2009, NIPS.

[31]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.