Robust Faster R-CNN: Increasing Robustness to Occlusions and Multi-scale Objects

Recognizing objects at vastly different scales and objects with occlusion is a fundamental challenge in computer vision. In this paper, we propose a novel method called Robust Faster R-CNN for detecting objects in multi-label images. The framework is based on Faster R-CNN architecture. We improve the Faster R-CNN by replacing ROIpoolings with ROIAligns to remove the harsh quantization of RoIPool and we design multi-ROIAligns by adding different sizes’ pooling(Aligns operation) in order to adapt to different sizes of objects. Furthermore, we adopt multi-feature fusion to enhance the ability to recognize small objects. In model training, we train an adversarial network to generate examples with occlusions and combine it with our model to make our model invariant to occlusions. Experimental results on Pascal VOC 2012 and 2007 datasets demonstrate the superiority of the proposed approach over many state-of-the-arts approaches.

[1]  Abhinav Gupta,et al.  A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Lan Lin,et al.  An Improved Convolutional Neural Network Model with Adversarial Net for Multi-label Image Classification , 2018, PRICAI.

[3]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[4]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Cordelia Schmid,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[7]  Fuchun Sun,et al.  RON: Reverse Connection with Objectness Prior Networks for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[10]  Wei Li,et al.  R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection , 2017, ArXiv.

[11]  Wei Li,et al.  R2 CNN: Rotational Region CNN for Arbitrarily-Oriented Scene Text Detection , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[12]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.