Multiview Label Sharing for Visual Representations and Classifications

Different views represent different aspects of images. It is more effective to combine them for visual classifications. This paper proposes a novel multiview label sharing method to combine the discriminative power of different views for classifications. Especially, we linearly transfer different views into a shared space for representations. The inter-view similarities are kept in the shared space for each view. We also ensure the intra-view similarities of the same class between different views are preserved in the shared space. We jointly learn the classifiers and transformation matrices by minimizing the summed classification loss along with the inter-view and intra-view similarity constraints. In this paper, the inter-view constraints refer to the similarities between images of the corresponding view, whereas the intra-view constraints refer to the similarities between different views of images with the same semantics. Experimental results and analysis on several public datasets show the effectiveness of the proposed multiview label sharing method for visual classifications.

[1]  Dieter Fox,et al.  Multipath Sparse Coding Using Hierarchical Matching Pursuit , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Qingshan Liu,et al.  Elastic Net Hypergraph Learning for Image Clustering and Semi-Supervised Classification , 2016, IEEE Transactions on Image Processing.

[4]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Qiang Yang,et al.  Translated Learning: Transfer Learning across Different Feature Spaces , 2008, NIPS.

[7]  Qi Tian,et al.  Incremental Codebook Adaptation for Visual Representation and Categorization , 2018, IEEE Transactions on Cybernetics.

[8]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[10]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[11]  Yun Fu,et al.  Learning Robust and Discriminative Subspace With Low-Rank Constraints , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[12]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[13]  Ming Shao,et al.  Cross-Modality Feature Learning Through Generic Hierarchical Hyperlingual-Words , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Bingbing Ni,et al.  HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Wenzhong Guo,et al.  Sparse Multigraph Embedding for Multimodal Feature Representation , 2017, IEEE Transactions on Multimedia.

[16]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[17]  Alfred O. Hero,et al.  Efficient learning of sparse, distributed, convolutional feature representations for object recognition , 2011, 2011 International Conference on Computer Vision.

[18]  Qi Tian,et al.  Bundled Local Features for Image Representation , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[20]  Florent Perronnin,et al.  Fisher vectors meet Neural Networks: A hybrid classification architecture , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[22]  Luc Van Gool,et al.  TriCoS: A Tri-level Class-Discriminative Co-segmentation Method for Image Classification , 2012, ECCV.

[23]  Tieniu Tan,et al.  Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[25]  Song-Chun Zhu,et al.  Joint Image-Text News Topic Detection and Tracking by Multimodal Topic And-Or Graph , 2017, IEEE Transactions on Multimedia.

[26]  Ming Shao,et al.  Missing Modality Transfer Learning via Latent Low-Rank Constraint , 2015, IEEE Transactions on Image Processing.

[27]  Cordelia Schmid,et al.  Combining efficient object localization and image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Qi Tian,et al.  Beyond Explicit Codebook Generation: Visual Representation Using Implicitly Transferred Codebooks , 2015, IEEE Transactions on Image Processing.

[29]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[30]  Stan Z. Li,et al.  2D–3D face matching using CCA , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[31]  Ming Shao,et al.  Sparse low-rank fusion based deep features for missing modality face recognition , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[32]  Christoph Meinel,et al.  A deep semantic framework for multimodal representation learning , 2016, Multimedia Tools and Applications.

[33]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[34]  Hongliang Li,et al.  PBC: Polygon-Based Classifier for Fine-Grained Categorization , 2017, IEEE Transactions on Multimedia.

[35]  Jian Dong,et al.  Subcategory-Aware Object Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Qingming Huang,et al.  Image classification by non-negative sparse coding, correlation constrained low-rank and sparse decomposition , 2014, Comput. Vis. Image Underst..

[37]  Changsheng Xu,et al.  Low-Rank Sparse Coding for Image Classification , 2013, 2013 IEEE International Conference on Computer Vision.

[38]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[39]  Qi Tian,et al.  Object categorization in sub-semantic space , 2014, Neurocomputing.

[40]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[41]  Yu Zhang,et al.  Exploit Bounding Box Annotations for Multi-Label Object Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Chao Liang,et al.  Fine-Grained Image Classification via Low-Rank Sparse Coding With General and Class-Specific Codebooks. , 2017, IEEE transactions on neural networks and learning systems.

[43]  Xiaoqin Zhang,et al.  Use bin-ratio information for category and scene classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Liang-Tien Chia,et al.  Laplacian Sparse Coding, Hypergraph Laplacian Sparse Coding, and Applications , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Qi Tian,et al.  Image Class Prediction by Joint Object, Context, and Background Modeling , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[46]  Qi Tian,et al.  Image classification by non-negative sparse coding, low-rank and sparse decomposition , 2011, CVPR 2011.

[47]  Pengpeng Zhao,et al.  Weak-Labeled Active Learning With Conditional Label Dependence for Multilabel Image Classification , 2017, IEEE Transactions on Multimedia.

[48]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[49]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[51]  Jiwen Lu,et al.  Deep Coupled Metric Learning for Cross-Modal Matching , 2017, IEEE Transactions on Multimedia.

[52]  Chien-Li Chou,et al.  Multimodal Video-to-Near-Scene Annotation , 2017, IEEE Transactions on Multimedia.

[53]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[54]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[56]  Wenwu Zhu,et al.  Learning Compact Hash Codes for Multimodal Representations Using Orthogonal Deep Structure , 2015, IEEE Transactions on Multimedia.

[57]  Qi Tian,et al.  Contextual Exemplar Classifier-Based Image Representation for Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[58]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Qi Tian,et al.  Image-Specific Classification With Local and Global Discriminations , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[60]  Shenghuo Zhu,et al.  Efficient Object Detection and Segmentation for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Chengjun Liu,et al.  A Sparse Representation Model Using the Complete Marginal Fisher Analysis Framework and Its Applications to Visual Recognition , 2017, IEEE Transactions on Multimedia.

[62]  Shuicheng Yan,et al.  Visual classification with multi-task joint sparse representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[63]  Shiguang Shan,et al.  Multi-View Discriminant Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Bo Zhao,et al.  Diversified Visual Attention Networks for Fine-Grained Object Classification , 2016, IEEE Transactions on Multimedia.