论文信息 - A Multi-strategy Region Proposal Network

A Multi-strategy Region Proposal Network

Abstract The Faster Region-based Convolutional Network (Faster R-CNN) was recently proposed achieving outstanding performance for object detection. Specially, a Region Proposal Network (RPN) is designed to efficiently predict region proposals with a wide range of scales and aspect ratios in Faster R-CNN. Nevertheless, once the number and quality of region proposals generated by RPN are not ideal the object detection performance of Faster R-CNN is affected. In this paper, multiple strategies are applied to address these limitations and improve RPN. Hence, a novel architecture for region proposal generation is presented which is named as Multi-strategy Region Proposal Network (MSRPN). Four improvements are presented in MSRPN. Firstly, a novel skip-layer connection network is designed for combining multi-level features and boosting the ability of pooling layers. Thereupon, the quality of region proposals is strengthened. Secondly, improved anchor boxes are introduced with adaptive aspect ratio and evenly distributed interval of selected scales. In this way, the number of predicted region proposals for detection is seriously reduced and the efficiency of object localization is increased. Particularly, the capability of small object detection is enhanced by applying the first and second improvements. Thirdly, classification layer and regression layer are unified as a single convolutional layer. Furthermore, the model complexity of output layer is reduced. Thus, the speed of training and testing is accelerated. Fourthly, the bounding box regression part of multi-task loss function in RPN is improved. Consequently, the performance of bounding box regression is promoted. In the experiment, MSRPN is compared with the Fast Region-based Convolutional Network (Fast R-CNN), Faster R-CNN, Inside-Outside Net (ION), Multi-region CNN (MR-CNN) and HyperNet approaches. MSRPN achieves the state-of-the-art mean average precision (mAP) of 78.9%, 74.8% and 32.1% on PASCAL VOC 2007, 2012 and MS COCO data sets with the deep VGG-16 model, surpassing other five object detection methods. Simultaneously, the above experiment results are obtained by MSRPN with only 150 region proposals per image. Additionally, MSRPN gets excellent performance on small object detection. Furthermore, MSRPN runs at 6 fps which is faster than other methods. In conclusion, the MSRPN method can provide important support for the intelligent object detection systems.

[1] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[2] Daniel P. Huttenlocher,et al. Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[3] Fuchun Sun,et al. HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5] Kavita Bala,et al. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Thomas Deselaers,et al. What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[8] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[9] Jitendra Malik,et al. DeepBox: Learning Objectness with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[11] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[12] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[13] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[14] Cordelia Schmid,et al. Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[15] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Graham W. Taylor,et al. Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[18] Pietro Perona,et al. Unsupervised Learning of Models for Recognition , 2000, ECCV.

[19] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] Dumitru Erhan,et al. Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Luc Van Gool,et al. DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[23] Cordelia Schmid,et al. Online Object Tracking with Proposal Selection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24] Peter V. Gehler,et al. Occlusion Patterns for Object Class Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Gang Wang,et al. A novel bacterial foraging optimization algorithm for feature selection , 2017, Expert Syst. Appl..

[26] Wei Chu,et al. Multi-category Classification by Soft-Max Combination of Binary Classifiers , 2003, Multiple Classifier Systems.

[27] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Philip H. S. Torr,et al. BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[29] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[30] Deva Ramanan,et al. Histograms of Sparse Codes for Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Jitendra Malik,et al. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Cristian Sminchisescu,et al. Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33] Bernt Schiele,et al. What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Yuting Zhang,et al. Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[36] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[37] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[38] Nikos Komodakis,et al. Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39] Antonio Torralba,et al. Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.