Learning Diversified Features for Object Detection via Multi-region Occlusion Example Generating

Object detection refers to the classification and localization of objects within an image by learning their diversified features. However, the existing detection models are usually sensitive to the important features in some local regions of the object. The existing algorithms cannot learn the diversified features regarding to each region effectively, which limit the performance of the model to a certain range. In this paper, we propose a novel and principle method called Multi-region Occlusion Example Generating (MOEG) to guide the detection model in fully learning the features of the various regions of the object. MOEG can generate completely new occlusion examples and it enables our detection model to learn the features of the remaining regions in the object by blocking the important regions in the proposal. It is a general method to generate occlusion examples and it can be implemented to most mainstream region-based detectors very easily such as Fast-RCNN and Faster-RCNN. Our experimental results indicate a \(2.4\%\) mAP boost on VOC2007 dataset and a 4.1% mAP boost on VOC2012 dataset compared to the Fast-RCNN pipeline. And as datasets become larger and more challenge, our method MOEG become more effective as demonstrated by the results on the MS COCO dataset.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Frank Hutter,et al.  Online Batch Selection for Faster Training of Neural Networks , 2015, ArXiv.

[4]  Xiaogang Wang,et al.  Gated Bi-directional CNN for Object Detection , 2016, ECCV.

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Hao Yu,et al.  S-OHEM: Stratified Online Hard Example Mining for Object Detection , 2017, CCCV.

[9]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[13]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[14]  Nikos Komodakis,et al.  Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Xinlei Chen,et al.  Spatial Memory for Context Reasoning in Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[19]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[20]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Yong Jae Lee,et al.  Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[25]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Abhinav Gupta,et al.  A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).