Semantically Modeling of Object and Context for Categorization

Object-centric-based categorization methods have been proven more effective than hard partitions of images (e.g., spatial pyramid matching). However, how to determine the locations of objects is still an open problem. Besides, modeling of context areas is often mixed with the background. Moreover, the semantic information is often ignored by these methods that only use visual representations for classification. In this paper, we propose an object categorization method by semantically modeling the object and context information (SOC). We first select a number of candidate regions with high confidence scores and semantically represent these regions by measuring correlations of each region with prelearned classifiers (e.g., local feature-based classifiers and deep convolutional-neural-network-based classifiers). These regions are clustered for object selections. The other selected areas are then viewed as context areas. We treat other areas beyond the object and context areas within one image as the background. The visually and semantically represented objects and contexts are then used along with the background area for object representations and categorizations. Experimental results on several public data sets well demonstrate the effectiveness of the proposed object categorization method by semantically modeling the object and context information.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[3]  Qiang Ji,et al.  Constrained Deep Transfer Feature Learning and Its Applications , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[5]  Trevor Darrell,et al.  Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Luc Van Gool,et al.  TriCoS: A Tri-level Class-Discriminative Co-segmentation Method for Image Classification , 2012, ECCV.

[7]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[8]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[9]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[10]  Liang-Tien Chia,et al.  Laplacian Sparse Coding, Hypergraph Laplacian Sparse Coding, and Applications , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[12]  ZissermanAndrew,et al.  The Pascal Visual Object Classes Challenge , 2015 .

[13]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Alexei A. Efros,et al.  Mid-level Visual Element Discovery as Discriminative Mode Seeking , 2013, NIPS.

[16]  Qi Tian,et al.  Object categorization in sub-semantic space , 2014, Neurocomputing.

[17]  Li Fei-Fei,et al.  DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jian Dong,et al.  Contextualizing Object Detection and Classification , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[20]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[21]  Tongliang Liu,et al.  Elastic Net Hypergraph Learning for Image Clustering and Semi-Supervised Classification. , 2017, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[22]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Fereshteh Sadeghi,et al.  Latent Pyramidal Regions for Recognizing Scenes , 2012, ECCV.

[24]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[25]  Fei-Fei Li,et al.  Object-Centric Spatial Pooling for Image Classification , 2012, ECCV.

[26]  Meng Wang,et al.  Towards optimizing human labeling for interactive image tagging , 2013, TOMCCAP.

[27]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[28]  Shuicheng Yan,et al.  Task-Driven Feature Pooling for Image Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Qingming Huang,et al.  Image classification by non-negative sparse coding, correlation constrained low-rank and sparse decomposition , 2014, Comput. Vis. Image Underst..

[30]  Qi Tian,et al.  Bundled Local Features for Image Representation , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[31]  Bingbing Ni,et al.  HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Qi Tian,et al.  Image classification by search with explicitly and implicitly semantic representations , 2017, Inf. Sci..

[33]  Li Lin,et al.  Joint Hierarchical Category Structure Learning and Large-Scale Image Classification , 2017, IEEE Transactions on Image Processing.

[34]  Cewu Lu,et al.  Learning Important Spatial Pooling Regions for Scene Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[36]  Daniel Cohen-Or,et al.  Fragment-based image completion , 2003, ACM Trans. Graph..

[37]  Nuno Vasconcelos,et al.  Holistic Context Models for Visual Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[39]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Yuan Xie,et al.  Removing Turbulence Effect via Hybrid Total Variation and Deformation-Guided Kernel Regression , 2016, IEEE Transactions on Image Processing.

[41]  Qi Tian,et al.  Image-Specific Classification With Local and Global Discriminations , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Shenghuo Zhu,et al.  Efficient Object Detection and Segmentation for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[45]  Qi Tian,et al.  Beyond visual features: A weak semantic image representation using exemplar classifiers for classification , 2013, Neurocomputing.

[46]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[47]  Qi Tian,et al.  Incremental Codebook Adaptation for Visual Representation and Categorization , 2018, IEEE Transactions on Cybernetics.

[48]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Qi Tian,et al.  Image-level classification by hierarchical structure learning with visual and semantic similarities , 2018, Inf. Sci..

[50]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[51]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[52]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Qi Tian,et al.  Multiview Label Sharing for Visual Representations and Classifications , 2018, IEEE Transactions on Multimedia.

[54]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[56]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[57]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[58]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[59]  Nuno Vasconcelos,et al.  Scene classification with low-dimensional semantic spaces and weak supervision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[62]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[63]  Qiang Chen,et al.  Bin Ratio-Based Histogram Distances and Their Application to Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Qi Tian,et al.  Contextual Exemplar Classifier-Based Image Representation for Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[65]  Bin Yang,et al.  CRAFT Objects from Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[68]  Yue Gao,et al.  Exploiting Web Images for Semantic Video Indexing Via Robust Sample-Specific Loss , 2014, IEEE Transactions on Multimedia.

[69]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2019, Computational Visual Media.

[70]  Qi Tian,et al.  Image Class Prediction by Joint Object, Context, and Background Modeling , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[71]  Qi Tian,et al.  Image classification by non-negative sparse coding, low-rank and sparse decomposition , 2011, CVPR 2011.

[72]  Qi Tian,et al.  Object Categorization Using Class-Specific Representations , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[73]  Qi Tian,et al.  Structured Weak Semantic Space Construction for Visual Categorization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[74]  Qi Tian,et al.  Fine-Grained Image Classification via Low-Rank Sparse Coding With General and Class-Specific Codebooks , 2017, IEEE Transactions on Neural Networks and Learning Systems.