A robot exploration strategy based on Q-learning network

This paper introduces a reinforcement learning method for exploring a corridor environment with the depth information from an RGB-D sensor only. The robot controller achieves obstacle avoidance ability by pre-training of feature maps using the depth information. The system is based on the recent Deep Q-Network (DQN) framework where a convolution neural network structure was adopted in the Q-value estimation of the Q-learning method. We separate the DQN into a supervised deep learning structure and a Q-learning network. The experiments of a Turtlebot in the Gazebo simulation environment show the robustness to different kinds of corridor environments. All of the experiments use the same pre-training deep learning structure. Note that the robot is traveling in environments which are different from the pre-training environment. It is the first time that raw sensor information is used to build such an exploring strategy for robotics by reinforcement learning.

[1]  Giulio Sandini,et al.  Autonomous Online Learning of Reaching Behavior in a humanoid Robot , 2012, Int. J. Humanoid Robotics.

[2]  Roland Siegwart,et al.  A Markov semi-supervised clustering approach and its application in topological map extraction , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Jürgen Schmidhuber,et al.  A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[4]  Roland Siegwart,et al.  Scene recognition with omnidirectional vision for topological map using lightweight adaptive descriptors , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Roland Siegwart,et al.  An adaptive descriptor for uncalibrated omnidirectional images - towards scene reconstruction by trifocal tensor , 2013, 2013 IEEE International Conference on Robotics and Automation.

[8]  Hao Chen,et al.  From co-saliency detection to object co-segmentation: A unified multi-stage low-rank matrix recovery approach , 2015, 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[9]  Roland Siegwart,et al.  Topological Mapping and Scene Recognition With Lightweight Color Descriptors for an Omnidirectional Camera , 2014, IEEE Transactions on Robotics.

[10]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[11]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[12]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[13]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[14]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[15]  Roland Siegwart,et al.  Incremental topological segmentation for semi-structured environments using discretized GVG , 2015, Auton. Robots.

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[18]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[19]  Yue Zhang,et al.  A Fast Multi-scale Convolutional Neural Network for Object Recognition , 2015 .