Multi-View Image Classification With Visual, Semantic and View Consistency

Multi-view visual classification methods have been widely applied to use discriminative information of different views. This strategy has been proven very effective by many researchers. On the one hand, images are often treated independently without fully considering their visual and semantic correlations. On the other hand, view consistency is often ignored. To solve these problems, in this paper, we propose a novel multi-view image classification method with visual, semantic and view consistency (VSVC). For each image, we linearly combine multi-view information for image classification. The combination parameters are determined by considering both the classification loss and the visual, semantic and view consistency. Visual consistency is imposed by ensuring that visually similar images of the same view are predicted to have similar values. For semantic consistency, we impose the locality constraint that nearby images should be predicted to have the same class by multi-view combination. View consistency is also used to ensure that similar images have consistent multi-view combination parameters. An alternative optimization strategy is used to learn the combination parameters. To evaluate the effectiveness of VSVC, we perform image classification experiments on several public datasets. The experimental results on these datasets show the effectiveness of the proposed VSVC method.

[1]  Zhongfei Zhang,et al.  Transductive Zero-Shot Learning With a Self-Training Dictionary Approach , 2017, IEEE Transactions on Cybernetics.

[2]  Nuno Vasconcelos,et al.  Scene Recognition on the Semantic Manifold , 2012, ECCV.

[3]  Wen Li,et al.  Domain Generalization and Adaptation Using Low Rank Exemplar SVMs , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Fei Gao,et al.  Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking , 2017, IEEE Transactions on Cybernetics.

[5]  Qi Tian,et al.  Image-Specific Classification With Local and Global Discriminations , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[7]  Naila Murray,et al.  Revisiting the Fisher vector for fine-grained classification , 2014, Pattern Recognit. Lett..

[8]  Qi Tian,et al.  Multiview Hessian Semisupervised Sparse Feature Selection for Multimedia Analysis , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Xiao Liu,et al.  Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition , 2016, AAAI.

[10]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11]  Thomas Maugey,et al.  Optimized Data Representation for Interactive Multiview Navigation , 2017, IEEE Transactions on Multimedia.

[12]  Lin Wu,et al.  Multiview Spectral Clustering via Structured Low-Rank Matrix Factorization , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Jiajun Bu,et al.  Exemplar-Based Image and Video Stylization Using Fully Convolutional Semantic Features , 2017, IEEE Transactions on Image Processing.

[14]  Qi Tian,et al.  Multiview Semantic Representation for Visual Recognition , 2020, IEEE Transactions on Cybernetics.

[15]  Aswin C. Sankaranarayanan,et al.  Shape and Spatially-Varying Reflectance Estimation from Virtual Exemplars , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Qi Tian,et al.  Image classification using spatial pyramid robust sparse coding , 2013, Pattern Recognit. Lett..

[18]  Bingbing Ni,et al.  HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Luc Van Gool,et al.  AENet: Learning Deep Audio Features for Video Analysis , 2017, IEEE Transactions on Multimedia.

[20]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[21]  Qi Tian,et al.  Unsupervised and Semi-Supervised Image Classification With Weak Semantic Consistency , 2019, IEEE Transactions on Multimedia.

[22]  Qi Tian,et al.  Contextual Exemplar Classifier-Based Image Representation for Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[24]  Qi Tian,et al.  Multiview, Few-Labeled Object Categorization by Predicting Labels With View Consistency , 2019, IEEE Transactions on Cybernetics.

[25]  Dacheng Tao,et al.  Webly-Supervised Fine-Grained Visual Categorization via Deep Domain Adaptation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Rama Chellappa,et al.  Learning Common and Feature-Specific Patterns: A Novel Multiple-Sparse-Representation-Based Tracker , 2018, IEEE Transactions on Image Processing.

[27]  Qi Tian,et al.  Image Class Prediction by Joint Object, Context, and Background Modeling , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  Qi Tian,et al.  Image classification by non-negative sparse coding, low-rank and sparse decomposition , 2011, CVPR 2011.

[29]  Jingjing Tang,et al.  Multiview Privileged Support Vector Machines , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Weiwei Liu,et al.  Multilabel Prediction via Cross-View Search , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Qi Tian,et al.  Structured Weak Semantic Space Construction for Visual Categorization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Yi Yang,et al.  Bi-Level Semantic Representation Analysis for Multimedia Event Detection , 2017, IEEE Transactions on Cybernetics.

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  Feng Zhou,et al.  Fine-Grained Categorization and Dataset Bootstrapping Using Deep Metric Learning with Humans in the Loop , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Qi Tian,et al.  Joint image representation and classification in random semantic spaces , 2015, Neurocomputing.

[36]  Feiping Nie,et al.  Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Qi Tian,et al.  Multiview Label Sharing for Visual Representations and Classifications , 2018, IEEE Transactions on Multimedia.

[38]  Subhransu Maji,et al.  Bilinear Convolutional Neural Networks for Fine-Grained Visual Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Ke Lu,et al.  Low-Rank Discriminant Embedding for Multiview Learning , 2017, IEEE Transactions on Cybernetics.

[40]  Ming Shao,et al.  Missing Modality Transfer Learning via Latent Low-Rank Constraint , 2015, IEEE Transactions on Image Processing.

[41]  Guna Seetharaman,et al.  Multiview Boosting With Information Propagation for Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Luming Zhang,et al.  Multiview Physician-Specific Attributes Fusion for Health Seeking , 2017, IEEE Transactions on Cybernetics.

[43]  Jonghyun Choi,et al.  Mining Discriminative Triplets of Patches for Fine-Grained Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Lei Zhang,et al.  Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Liang-Tien Chia,et al.  Laplacian Sparse Coding, Hypergraph Laplacian Sparse Coding, and Applications , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Qi Tian,et al.  Beyond Explicit Codebook Generation: Visual Representation Using Implicitly Transferred Codebooks , 2015, IEEE Transactions on Image Processing.

[47]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[48]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[49]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[50]  Qingming Huang,et al.  Image classification by non-negative sparse coding, correlation constrained low-rank and sparse decomposition , 2014, Comput. Vis. Image Underst..

[51]  Wei Yuan,et al.  Multi-view manifold learning with locality alignment , 2018, Pattern Recognit..

[52]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Philip S. Yu,et al.  Multiple Structure-View Learning for Graph Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[54]  Huda Khayrallah,et al.  Deep Generalized Canonical Correlation Analysis , 2017, RepL4NLP@ACL.

[55]  Jian Yang,et al.  Boosted Convolutional Neural Networks , 2016, BMVC.

[56]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[57]  Jonathan Krause,et al.  Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Shiguang Shan,et al.  Multi-View Discriminant Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Dieter Fox,et al.  Multipath Sparse Coding Using Hierarchical Matching Pursuit , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  David A. Forsyth,et al.  Describing objects by their attributes , 2009, CVPR.

[61]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Qi Tian,et al.  Incremental Codebook Adaptation for Visual Representation and Categorization , 2018, IEEE Transactions on Cybernetics.

[63]  Maja Pantic,et al.  Discriminative Shared Gaussian Processes for Multiview and View-Invariant Facial Expression Recognition , 2015, IEEE Transactions on Image Processing.

[64]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[65]  Alfred O. Hero,et al.  Efficient learning of sparse, distributed, convolutional feature representations for object recognition , 2011, 2011 International Conference on Computer Vision.

[66]  Wei-Shi Zheng,et al.  Semi-Supervised Multi-View Discrete Hashing for Fast Image Search , 2017, IEEE Transactions on Image Processing.

[67]  Qi Tian,et al.  Birds of a feather flock together: Visual representation with scale and class consistency , 2018, Inf. Sci..

[68]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Yun Fu,et al.  Marginalized Denoising Dictionary Learning With Locality Constraint , 2018, IEEE Transactions on Image Processing.

[70]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[71]  Qi Tian,et al.  Image classification using Harr-like transformation of local features with coding residuals , 2013, Signal Process..

[72]  Qi Tian,et al.  Image classification by search with explicitly and implicitly semantic representations , 2017, Inf. Sci..

[73]  Qi Tian,et al.  Image-level classification by hierarchical structure learning with visual and semantic similarities , 2018, Inf. Sci..

[74]  Qi Tian,et al.  Semantically Modeling of Object and Context for Categorization , 2019, IEEE Transactions on Neural Networks and Learning Systems.