Context Learning Network for Object Detection

Current state-of-the-art detectors typically classify candidate proposals using their interior features. However, the valuable contexts are not fully exploited by existing detectors yet, which limits the detection performance. In this paper, we present a context learning network (CLN), which aims to capture pairwise relation between objects and global contexts of each object. The proposed CLN consists of two subnetworks: a multi-layer perceptron (MLP) with three layers and a convolutional neural network (ConvNet) with two layers. The MLP is first designed to capture the pairwise relation context. Pairwise relationship context is then gathered and concatenated to further learn the global contexts by the ConvNet. Finally, we obtain the desired context feature maps with rich contextual information that are useful for accurate object detection. The proposed CLN is lightweight and it is easy to embed in any existing networks for object detection. In this paper, we present a context-aware Faster-RCNN with the proposed CLN and conduct extensive experiments to evaluate its performance. Experimental results demonstrate that the context-aware Faster-RCNN achieves state-of-the-art performance with the 82.1%, 80.7% and 38.4%mAPs on VOC 2007, VOC 2012 and COCO datasets, respectively.

[1]  Bingbing Ni,et al.  HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  tephen E. Palmer The effects of contextual scenes on the identification of objects , 1975, Memory & cognition.

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jian Dong,et al.  Attentive Contexts for Object Detection , 2016, IEEE Transactions on Multimedia.

[6]  Daphna Weinshall,et al.  Exploiting Object Hierarchy: Combining Models from Different Category Levels , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[8]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Alexei A. Efros,et al.  An empirical study of context in object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[11]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Abhinav Gupta,et al.  Contextual Priming and Feedback for Faster R-CNN , 2016, ECCV.

[13]  David A. Forsyth,et al.  Learning a sequential search for landmarks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[15]  Rui Zhang,et al.  Contextual Object Detection With Spatial Context Prototypes , 2014, IEEE Transactions on Multimedia.

[16]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[17]  Trevor Darrell,et al.  Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  M. Bar Visual objects in context , 2004, Nature Reviews Neuroscience.

[19]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Saurabh Singh,et al.  Where to Look: Focus Regions for Visual Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[22]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xiaogang Wang,et al.  Window-Object Relationship Guided Representation Learning for Generic Object Detections , 2015, ArXiv.

[24]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[26]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28]  Andrew Y. Ng,et al.  End-to-End People Detection in Crowded Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[30]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Trevor Darrell,et al.  Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.

[32]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[33]  Serge J. Belongie,et al.  Context based object categorization: A critical survey , 2010, Comput. Vis. Image Underst..

[34]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[35]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Zhuowen Tu,et al.  Auto-Context and Its Application to High-Level Vision Tasks and 3D Brain Image Segmentation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[38]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.