Exploring the Prediction Consistency of Multiple Views for Transductive Visual Recognition

Although great process has been achieved to accurately classify images, many methods only use labeled images while ignoring the large quantity of unlabeled images. To make use of unlabeled images, in this letter, we propose a novel transductive visual recognition method using the prediction consistency of multiple views (T-PCMV). Both labeled and unlabeled images are used in a unified framework. The predictions of unlabeled images are learned by linearly combining the discriminative information of multiple views. We ensure the smooth constraint that visually similar images should be predicted with similar labels. To learn the classifier, we jointly minimize the classification loss and the discrepancy of predicted labels. To evaluate the usefulness of the proposed method, we conduct transductive visual recognition experiments on four image datasets. Experimental results well demonstrate the effectiveness of the proposed T-PCMV method.

[1]  Qi Tian,et al.  Unsupervised and Semi-Supervised Image Classification With Weak Semantic Consistency , 2019, IEEE Transactions on Multimedia.

[2]  Xuelong Li,et al.  Convex Multiview Semi-Supervised Classification , 2017, IEEE Transactions on Image Processing.

[3]  Charu C. Aggarwal,et al.  Joint Intermodal and Intramodal Label Transfers for Extremely Rare or Unseen Classes , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[5]  Wenzhang Zhuge,et al.  Multi-view subspace learning via bidirectional sparsity , 2020, Pattern Recognit..

[6]  Luwen Huangfu,et al.  Class-Prototype Discriminative Network for Generalized Zero-Shot Learning , 2020, IEEE Signal Processing Letters.

[7]  Masayuki Karasuyama,et al.  Multiple Graph Label Propagation by Sparse Integration , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[9]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[12]  Armand Joulin,et al.  Unsupervised Learning by Predicting Noise , 2017, ICML.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yuting Su,et al.  Low-Rank Regularized Deep Collaborative Matrix Factorization for Micro-Video Multi-Label Classification , 2020, IEEE Signal Processing Letters.

[17]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[18]  Bo Wang,et al.  Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification , 2013, ICCV.

[19]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[20]  Wei Liu,et al.  Multi-Modal Curriculum Learning for Semi-Supervised Image Classification , 2016, IEEE Transactions on Image Processing.

[21]  Qi Tian,et al.  Multiview Label Sharing for Visual Representations and Classifications , 2018, IEEE Transactions on Multimedia.

[22]  Tao Liu,et al.  Weakly-Supervised Sparse Coding With Geometric Prior for Interactive Texture Segmentation , 2020, IEEE Signal Processing Letters.

[23]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Qi Tian,et al.  Incremental Codebook Adaptation for Visual Representation and Categorization , 2018, IEEE Transactions on Cybernetics.

[25]  Qi Tian,et al.  Image Class Prediction by Joint Object, Context, and Background Modeling , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Qi Tian,et al.  Multi-View Image Classification With Visual, Semantic and View Consistency , 2020, IEEE Transactions on Image Processing.

[28]  Qi Tian,et al.  Bundled Local Features for Image Representation , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[29]  Dezhong Peng,et al.  Deep Semisupervised Class- and Correlation-Collapsed Cross-View Learning. , 2020, IEEE transactions on cybernetics.

[30]  Luc Van Gool,et al.  Ensemble Projection for Semi-supervised Image Classification , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  Xiaorui Li,et al.  Generalized Zero-Shot Learning With Multi-Channel Gaussian Mixture VAE , 2020, IEEE Signal Processing Letters.