Active Object Detection With Multistep Action Prediction Using Deep Q-Network

In recent years, great success has been achieved in visual object detection, which is one of the fundamental tasks in the field of industrial intelligence. Most of existing methods have been proposed to deal with single well-captured still images, while in practical robotic applications, due to nuisances, such as tiny scale, partial view, or occlusion, one still image may not contain enough information for object detection. However, an intelligent robot has the capability to adjust its viewpoint to get better images for detection. Therefore, active object detection becomes a very important perception strategy for intelligent robots. In this paper, by formulating active object detection as a sequential action decision process, a deep reinforcement learning framework is established to resolve it. Furthermore, a novel deep Q-learning network (DQN) with a dueling architecture is proposed, the network has two separate output channels, one predicts action type and the other predicts action range. By combining the two output channels, the action space is explored more efficiently. Several methods are extensively validated and the results show that the proposed one obtains the best results and predicts action in real time.

[1]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[2]  Di Guo,et al.  From foot to head: Active face finding using deep Q-learning , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[3]  Xiaojuan Li,et al.  From Offline Towards Real-Time Verification for Robot Systems , 2018, IEEE Transactions on Industrial Informatics.

[4]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[5]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[6]  Fuchun Sun,et al.  Extreme Trust Region Policy Optimization for Active Object Recognition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[8]  Hongdong Li,et al.  Tracking Randomly Moving Objects on Edge Box Proposals , 2015, ArXiv.

[9]  Jürgen Beyerer,et al.  Bayesian active object recognition via Gaussian process regression , 2012, 2012 15th International Conference on Information Fusion.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yunming Ye,et al.  Learning Discriminative Subspace Models for Weakly Supervised Face Detection , 2017, IEEE Transactions on Industrial Informatics.

[14]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[15]  Simone Frintrop,et al.  Saliency-Guided Object Candidates Based on Gestalt Principles , 2015, ICVS.

[16]  Bernt Schiele,et al.  Filtered channel features for pedestrian detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Abhinav Gupta,et al.  A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[19]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Dongbin Zhao,et al.  Deep Reinforcement Learning With Visual Attention for Vehicle Classification , 2017, IEEE Transactions on Cognitive and Developmental Systems.

[21]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Zhijun Li,et al.  Robust Tube-Based Predictive Control for Visual Servoing of Constrained Differential-Drive Mobile Robots , 2018, IEEE Transactions on Industrial Electronics.

[23]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  J. Denzler,et al.  An Information Theoretic Approach to Optimal Sensor Data Selection for State Estimation , 2002 .

[25]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[26]  Garrison W. Cottrell,et al.  Deep active object recognition by joint label and action prediction , 2017, Comput. Vis. Image Underst..

[27]  Kao-Shing Hwang,et al.  Decoupled Visual Servoing With Fuzzy Q-Learning , 2018, IEEE Transactions on Industrial Informatics.

[28]  Chih-Yang Lin,et al.  Three-Pronged Compensation and Hysteresis Thresholding for Moving Object Detection in Real-Time Video Surveillance , 2017, IEEE Transactions on Industrial Electronics.

[29]  Yong Wang,et al.  Planning and Tracking in Image Space for Image-Based Visual Servoing of a Quadrotor , 2018, IEEE Transactions on Industrial Electronics.

[30]  Jana Kosecka,et al.  A dataset for developing and benchmarking active vision , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.