Image Class Prediction by Joint Object, Context, and Background Modeling

State-of-the-art image classification methods often use spatial pyramid matching or its variants to make use of the spatial layout of visual features. However, objects may appear at various places with different scales and orientations. Besides, traditionally object-centric-based methods only consider objects and the background without fully exploring the context information. To solve these problems, in this paper we propose a novel image classification method by jointly modeling the object, context, and background information (OCB). OCB consists of three components: 1) locate the positions of objects; 2) determine the context areas of objects; and 3) treat the other areas as the background. We use objectness proposal techniques to select candidate bounding boxes. Boxes with high confidence scores are combined to determine objects’ positions. To select the context areas, we use candidate boxes that have relatively lower confidence scores compared with boxes for object location selection. The other areas are viewed as the background. We jointly combine the object, context, and background for image representation and classification. Experiments on six data sets well demonstrate the superiority of the proposed OCB method over other spatial partition methods.

[1]  Changsheng Li,et al.  Learning ordinal discriminative features for age estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[8]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[9]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Chun Chen,et al.  Graph Regularized Sparse Coding for Image Representation , 2011, IEEE Transactions on Image Processing.

[11]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Qi Tian,et al.  Image classification using Harr-like transformation of local features with coding residuals , 2013, Signal Process..

[13]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Shuicheng Yan,et al.  Visual classification with multi-task joint sparse representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[19]  Stan Sclaroff,et al.  Efficient nearest neighbor classification using a cascade of approximate similarity measures , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Luc Van Gool,et al.  TriCoS: A Tri-level Class-Discriminative Co-segmentation Method for Image Classification , 2012, ECCV.

[22]  David G. Lowe,et al.  Local Naive Bayes Nearest Neighbor for image classification , 2011, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Qi Tian,et al.  Beyond Explicit Codebook Generation: Visual Representation Using Implicitly Transferred Codebooks , 2015, IEEE Transactions on Image Processing.

[24]  Fei-Fei Li,et al.  Object-Centric Spatial Pooling for Image Classification , 2012, ECCV.

[25]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Sebastian Nowozin,et al.  Task-Specific Image Partitioning , 2013, IEEE Transactions on Image Processing.

[27]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[28]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[30]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Shenghuo Zhu,et al.  Efficient Object Detection and Segmentation for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Qi Tian,et al.  Image classification by non-negative sparse coding, low-rank and sparse decomposition , 2011, CVPR 2011.

[34]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[35]  Yongdong Zhang,et al.  Deep Fusion of Multiple Semantic Cues for Complex Event Recognition , 2016, IEEE Transactions on Image Processing.

[36]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[37]  Daniel Cohen-Or,et al.  Fragment-based image completion , 2003, ACM Trans. Graph..

[38]  Nuno Vasconcelos,et al.  Holistic Context Models for Visual Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[40]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[41]  Qingshan Liu,et al.  Elastic Net Hypergraph Learning for Image Clustering and Semi-Supervised Classification , 2016, IEEE Transactions on Image Processing.

[42]  Xin Zhang,et al.  Spatially Regularized Streaming Sensor Selection , 2016, AAAI.

[43]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[44]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[45]  Cewu Lu,et al.  Learning Important Spatial Pooling Regions for Scene Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Derek Hoiem,et al.  Learning Discriminative Collections of Part Detectors for Object Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Bo Wang,et al.  Multi-Instance Multi-Label Learning Combining Hierarchical Context and its Application to Image Annotation , 2016, IEEE Transactions on Multimedia.

[48]  Qiang Chen,et al.  Bin Ratio-Based Histogram Distances and Their Application to Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Fereshteh Sadeghi,et al.  Latent Pyramidal Regions for Recognizing Scenes , 2012, ECCV.

[50]  Shuicheng Yan,et al.  Task-Driven Feature Pooling for Image Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Ruimin Hu,et al.  Facial Image Hallucination Through Coupled-Layer Neighbor Embedding , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[52]  Liang-Tien Chia,et al.  Laplacian Sparse Coding, Hypergraph Laplacian Sparse Coding, and Applications , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[54]  Qi Tian,et al.  Fine-Grained Image Classification via Low-Rank Sparse Coding With General and Class-Specific Codebooks , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[55]  Changsheng Li,et al.  Human Age Estimation Based on Locality and Ordinal Information , 2015, IEEE Transactions on Cybernetics.

[56]  Alexei A. Efros,et al.  Mid-level Visual Element Discovery as Discriminative Mode Seeking , 2013, NIPS.

[57]  Qiang Chen,et al.  Contextualizing Object Detection and Classification , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[59]  Qi Tian,et al.  Image classification using spatial pyramid robust sparse coding , 2013, Pattern Recognit. Lett..