Rsmdp-Based robust Q-Learning for Optimal Path Planning in a Dynamic Environment

This paper presents arobust Q-learning method for path planningin a dynamic environment. The method consists of three steps: first, a regime-switching Markov decision process (RSMDP) is formed to present the dynamic environment; second a probabilistic roadmap (PRM) is constructed, integrated with the RSMDP and stored as a graph whose nodes correspond to a collision-free world state for the robot; and third, an onlineQ-learning method with dynamic stepsize, which facilitates robust convergence of the Q-value iteration, is integrated with the PRM to determine an optimal path for reaching the goal. In this manner, the robot is able to use past experience for improving its performance in avoiding not only static obstacles but also moving obstacles, without knowing the nature of the obstacle motion. The use ofregime switching in the avoidance of obstacles with unknown motion is particularly innovative.  The developed approach is applied to a homecare robot in computer simulation. The results show that the online path planner with Q-learning is able torapidly and successfully converge to the correct path.

[1]  B. Faverjon,et al.  Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[2]  Lingqi Zeng,et al.  Mobile Robot Navigation for Moving Obstacles with Unpredictable Direction Changes, Including Humans , 2012, Adv. Robotics.

[3]  Florent Lamiraux,et al.  Motion Planning and Obstacle Avoidance , 2016, Springer Handbook of Robotics, 2nd Ed..

[4]  Michael Himmelsbach,et al.  Driving with tentacles: Integral structures for sensing and motion , 2008 .

[5]  Takahiro Wada,et al.  A deceleration control method of automobile for collision avoidance based on driver's perceptual risk , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  M. Spong,et al.  Robot Modeling and Control , 2005 .

[7]  Lionel Lapierre,et al.  Simulatneous Path Following and Obstacle Avoidance Control of a Unicycle-type Robot , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[8]  Michael Himmelsbach,et al.  Driving with Tentacles - Integral Structures for Sensing and Motion , 2008, The DARPA Urban Challenge.

[9]  Jur P. van den Berg,et al.  Anytime path planning and replanning in dynamic environments , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[10]  Yoram Koren,et al.  Potential field methods and their inherent limitations for mobile robot navigation , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[11]  Avinash C. Kak,et al.  Vision-based navigation by a mobile robot with obstacle avoidance using single-camera vision and ultrasonic sensing , 1998, IEEE Trans. Robotics Autom..

[12]  Rodney A. Brooks,et al.  A subdivision algorithm in configuration space for findpath with rotation , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  S. Thrun,et al.  Probablistic Control of Human Robot Interaction : Experiments with A Robotic Assistant for Nursing Homes , 2003 .

[14]  Lydia E. Kavraki,et al.  Anytime solution optimization for sampling-based motion planning , 2013, 2013 IEEE International Conference on Robotics and Automation.

[15]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[16]  Felisa J. Vázquez-Abad,et al.  Adaptive stepsize selection for tracking in a regime-switching environment , 2007, Autom..

[17]  Thierry Siméon,et al.  The Stochastic Motion Roadmap: A Sampling Framework for Planning with Markov Motion Uncertainty , 2007, Robotics: Science and Systems.

[18]  Emilio Frazzoli,et al.  An incremental sampling-based algorithm for stochastic optimal control , 2012, 2012 IEEE International Conference on Robotics and Automation.

[19]  Howie Choset,et al.  Principles of Robot Motion: Theory, Algorithms, and Implementation ERRATA!!!! 1 , 2007 .

[20]  Emilio Frazzoli,et al.  Anytime Motion Planning using the RRT* , 2011, 2011 IEEE International Conference on Robotics and Automation.

[21]  Lydia E. Kavraki,et al.  Analysis of probabilistic roadmaps for path planning , 1998, IEEE Trans. Robotics Autom..

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  Florent Lamiraux,et al.  Reactive path deformation for nonholonomic mobile robots , 2004, IEEE Transactions on Robotics.

[24]  Mark H. Overmars,et al.  Planning Time-Minimal Safe Paths Amidst Unpredictably Moving Obstacles , 2008, Int. J. Robotics Res..

[25]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[26]  Gang George Yin,et al.  Regime Switching Stochastic Approximation Algorithms with Application to Adaptive Discrete Stochastic Optimization , 2004, SIAM J. Optim..

[27]  Lydia E. Kavraki,et al.  Randomized preprocessing of configuration for fast path planning , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[28]  Oussama Khatib,et al.  Elastic bands: connecting path planning and control , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.