Multiview Semantic Representation for Visual Recognition

Due to interclass and intraclass variations, the images of different classes are often cluttered which makes it hard for efficient classifications. The use of discriminative classification algorithms helps to alleviate this problem. However, it is still an open problem to accurately model the relationships between visual representations and human perception. To alleviate these problems, in this paper, we propose a novel multiview semantic representation (MVSR) algorithm for efficient visual recognition. First, we leverage visually based methods to get initial image representations. We then use both visual and semantic similarities to divide images into groups which are then used for semantic representations. We treat different image representation strategies, partition methods, and numbers as different views. A graph is then used to combine the discriminative power of different views. The similarities between images can be obtained by measuring the similarities of graphs. Finally, we train classifiers to predict the categories of images. We evaluate the discriminative power of the proposed MVSR method for visual recognition on several public image datasets. Experimental results show the effectiveness of the proposed method.

[1]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Xiao-Yuan Jing,et al.  Multi-spectral low-rank structured dictionary learning for face recognition , 2016, Pattern Recognit..

[3]  Fei-Fei Li,et al.  Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Tapio Pahikkala,et al.  Fast Kronecker Product Kernel Methods via Generalized Vec Trick , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ke Lu,et al.  Low-Rank Discriminant Embedding for Multiview Learning , 2017, IEEE Transactions on Cybernetics.

[7]  Luming Zhang,et al.  Multiview Physician-Specific Attributes Fusion for Health Seeking , 2017, IEEE Transactions on Cybernetics.

[8]  Haibo He,et al.  Adaptive Critic Nonlinear Robust Control: A Survey , 2017, IEEE Transactions on Cybernetics.

[9]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Qi Tian,et al.  Fine-Grained Image Classification via Low-Rank Sparse Coding With General and Class-Specific Codebooks , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[14]  Lawrence O. Hall,et al.  Active Multitask Learning With Trace Norm Regularization Based on Excess Risk , 2017, IEEE Transactions on Cybernetics.

[15]  David Zhang,et al.  Face and palmprint pixel level fusion and Kernel DCV-RBF classifier for small sample biometric recognition , 2007, Pattern Recognit..

[16]  Yi Yang,et al.  Bi-Level Semantic Representation Analysis for Multimedia Event Detection , 2017, IEEE Transactions on Cybernetics.

[17]  Fei Gao,et al.  Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking , 2017, IEEE Transactions on Cybernetics.

[18]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[20]  Qi Tian,et al.  Beyond Explicit Codebook Generation: Visual Representation Using Implicitly Transferred Codebooks , 2015, IEEE Transactions on Image Processing.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Nuno Vasconcelos,et al.  Scene classification with semantic Fisher vectors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[26]  Qi Tian,et al.  Undo the codebook bias by linear transformation for visual applications , 2013, ACM Multimedia.

[27]  Luc Van Gool,et al.  TriCoS: A Tri-level Class-Discriminative Co-segmentation Method for Image Classification , 2012, ECCV.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Qi Tian,et al.  Bundled Local Features for Image Representation , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Baowen Xu,et al.  Super-resolution Person re-identification with semi-coupled low-rank discriminant dictionary learning , 2015, CVPR.

[31]  Xin Li,et al.  Latent Semantic Representation Learning for Scene Classification , 2014, ICML.

[32]  Limin Wang,et al.  MoFAP: A Multi-level Representation for Action Recognition , 2015, International Journal of Computer Vision.

[33]  Dieter Fox,et al.  Multipath Sparse Coding Using Hierarchical Matching Pursuit , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  David Vázquez,et al.  On-Board Object Detection: Multicue, Multimodal, and Multiview Random Forest of Local Experts , 2017, IEEE Transactions on Cybernetics.

[36]  Qi Tian,et al.  Incremental Codebook Adaptation for Visual Representation and Categorization , 2018, IEEE Transactions on Cybernetics.

[37]  Dimitrios Makris,et al.  One-Shot Learning of Human Activity With an MAP Adapted GMM and Simplex-HMM , 2017, IEEE Transactions on Cybernetics.

[38]  Bingbing Ni,et al.  HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Shu Kong,et al.  Low-Rank Bilinear Pooling for Fine-Grained Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[41]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[42]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[43]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[44]  Xiaodong Yu,et al.  Attribute-Based Transfer Learning for Object Categorization with Zero/One Training Example , 2010, ECCV.

[45]  Qi Tian,et al.  Birds of a feather flock together: Visual representation with scale and class consistency , 2018, Inf. Sci..

[46]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[47]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Qi Tian,et al.  Joint image representation and classification in random semantic spaces , 2015, Neurocomputing.

[49]  Qi Tian,et al.  Image classification using spatial pyramid robust sparse coding , 2013, Pattern Recognit. Lett..

[50]  Chu-Song Chen,et al.  Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  David Zhang,et al.  A face and palmprint recognition approach based on discriminant DCT feature extraction , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[52]  Nuno Vasconcelos,et al.  Holistic Context Models for Visual Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[54]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[55]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[56]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[57]  Xiao-Yuan Jing,et al.  Super-resolution Person re-identification with semi-coupled low-rank discriminant dictionary learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Ivor W. Tsang,et al.  Domain Transfer Multiple Kernel Learning , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Qingming Huang,et al.  Image classification by non-negative sparse coding, correlation constrained low-rank and sparse decomposition , 2014, Comput. Vis. Image Underst..

[61]  Qi Tian,et al.  Image classification by search with explicitly and implicitly semantic representations , 2017, Inf. Sci..

[62]  Subhransu Maji,et al.  Deep filter banks for texture recognition and segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Qi Tian,et al.  Contextual Exemplar Classifier-Based Image Representation for Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[64]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[65]  Ivan Laptev,et al.  Weakly supervised object recognition with convolutional neural networks , 2014 .

[66]  Xiaoqin Zhang,et al.  Use bin-ratio information for category and scene classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[67]  Qi Tian,et al.  Multiview Label Sharing for Visual Representations and Classifications , 2018, IEEE Transactions on Multimedia.

[68]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Barbara Caputo,et al.  Safety in numbers: Learning categories from few examples with multi model knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[70]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[71]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[72]  Dong Yue,et al.  Multi-view low-rank dictionary learning for image classification , 2016, Pattern Recognit..

[73]  Qi Tian,et al.  Boosted random contextual semantic space based representation for visual recognition , 2016, Inf. Sci..

[74]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[75]  Qi Tian,et al.  Image Class Prediction by Joint Object, Context, and Background Modeling , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[76]  Qi Tian,et al.  Image classification by non-negative sparse coding, low-rank and sparse decomposition , 2011, CVPR 2011.

[77]  Nanning Zheng,et al.  Robust Learning With Kernel Mean $p$ -Power Error Loss , 2016, IEEE Transactions on Cybernetics.

[78]  Jingjing Tang,et al.  Multiview Privileged Support Vector Machines , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[79]  Qi Tian,et al.  Structured Weak Semantic Space Construction for Visual Categorization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[80]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[81]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[82]  Dacheng Tao,et al.  Webly-Supervised Fine-Grained Visual Categorization via Deep Domain Adaptation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83]  Qi Tian,et al.  Object categorization in sub-semantic space , 2014, Neurocomputing.