Natural Behavior Learning Based on Deep Reinforcement Learning for Autonomous Navigation of Mobile Robots

This paper proposes a new autonomous navigation method of a two wheeled mobile robot using LiDAR sensor in an unknown environment. Recently, Deep Q Network(DQN) which is a combination of deep learning and Q-learning theory is attracting attention as a reinforcement learning algorithm. It is used to learn the robot itself to recognize obstacles and to avoid collisions while it is moving to a designated destination. The existing DQN method can handle only discrete and low dimensional space work, which is not suitable for continuous, especially the control of mobile robot. The existing LiDAR sensor method uses the distance value as the state used for the input of the learning, and therefore the system determines the next action only by the distance of the obstacle from the mobile robot. In this process, due to the frequent fluctuation of the action value, unnatural acceleration / deceleration actions are required, which cause not only physical shocks to the robot but also low driving power efficiency. In this paper, the problem has been solved by applying the replay buffer to store the output of the network. That is, the action values are stored in the memory and fed back to the input again following the action order of the network. Experiments are carried out on an actual robot after reinforcement learning in the ROS-GAZEBO simulations and the validity of the algorithm is verified through the analysis of the experimental data

[1]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[2]  Kee-Ho Yu,et al.  Design of Exploration Rover Platform with Movable Center of Mass for Enhancement of Trafficability and Stability , 2017 .

[3]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[4]  Alejandro Hernández Cordero,et al.  Extending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS and Gazebo , 2016, ArXiv.

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[7]  Keunsung Bae,et al.  Traversable Region Detection Based on a Lateral Slope Feature for Autonomous Driving of UGVs , 2017 .

[8]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[9]  Paul Thie Markov Decision Processes , 1983 .

[10]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[11]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[12]  Jae-Hyun Jung,et al.  Design of Force Sensors of Rehabilitation Robot for Knee Rehabilitation , 2017 .

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.