Parallel Search by Reinforcement Learning for Object Detection

Object detection algorithms generally search through extensive potential areas without considering spatial correlations. To fully utilize rich information contained in high-level image features, a hierarchical object detection method with parallel search formulated as Markov Decision Process, is presented. Starting from independent initial locations, our model generates adequate region proposals by Reinforcement Learning (RL) method for subsequent refinement of bounding boxes. An attention-based state initialization algorithm combined with a novel reward function for RL training are proposed to facilitate the agent’s control over window transformations. Following a coarse-to-fine detection strategy, we adopt adjustable action parameters and perform profound refinement for the generated proposals. Compared with existing detection algorithms, experiments on PASCAL VOC 2007 & 2012 dataset indicate the proposed model achieves encouraging object detection performance with fewer proposals generated.

[1]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Wilson S. Geisler,et al.  Optimal eye movement strategies in visual search , 2005, Nature.

[3]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[6]  Shuicheng Yan,et al.  Tree-Structured Reinforcement Learning for Sequential Object Localization , 2016, NIPS.

[7]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[11]  M. Carrasco Visual attention: The past 25 years , 2011, Vision Research.

[12]  Miriam Bellver,et al.  Hierarchical Object Detection with Deep Reinforcement Learning , 2016, NIPS 2016.

[13]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Svetlana Lazebnik,et al.  Active Object Localization with Deep Reinforcement Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[17]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[21]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.