Avoiding moving obstacles with stochastic hybrid dynamics using PEARL: PrEference Appraisal Reinforcement Learning

Manual derivation of optimal robot motions for task completion is difficult, especially when a robot is required to balance its actions between opposing preferences. One solution has been proposed to automatically learn near optimal motions with Reinforcement Learning (RL). This has been successful for several tasks including swing-free UAV flight, table tennis, and autonomous driving. However, high-dimensional problems remain a challenge. We address this dimensionality constraint with PrEference Appraisal Reinforcement Learning (PEARL), which solves tasks with opposing preferences for acceleration controlled robots. PEARL projects the high-dimensional continuous robot state space to a low dimensional preference feature space resulting in efficient and adaptable planning. We demonstrate that on a dynamic obstacle avoidance robotic task, a single learning on a much simpler problem performs real-time decision-making for significantly larger, high-dimensional problems working in unbounded continuous states and actions. We trained the agent with 4 static obstacles, while the trained agent avoids up to 900 moving obstacles with complex hybrid stochastic obstacle dynamics in a highly constrained space using only limited information about the environment. We compare these tasks to traditional, often manually tuned solutions for these high-dimensional problems.

[1]  Johannes Fürnkranz,et al.  Preference-Based Reinforcement Learning: A Preliminary Survey , 2013 .

[2]  Lydia Tapia,et al.  Automated aerial suspended cargo delivery through reinforcement learning , 2017, Artif. Intell..

[3]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[4]  Michael L. Littman,et al.  Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.

[5]  Russ Tedrake,et al.  Path planning in 1000+ dimensions using a task-space Voronoi bias , 2009, 2009 IEEE International Conference on Robotics and Automation.

[6]  Li-Chen Fu,et al.  Human-Centered Robot Navigation—Towards a Harmoniously Human–Robot Coexisting Environment , 2011, IEEE Transactions on Robotics.

[7]  Cheng Wu,et al.  Novel Function Approximation Techniques for Large-scale Reinforcement Learning , 2010 .

[8]  Jorge A. Baier,et al.  Weighted real-time heuristic search , 2013, AAMAS.

[9]  Lydia Tapia,et al.  Aggressive Moving Obstacle Avoidance Using a Stochastic Reachable Set Based Potential Field , 2014, WAFR.

[10]  Mike Stilman,et al.  Manipulation planning with soft task constraints , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Peter Stone,et al.  TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.

[12]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[13]  Emilio Frazzoli,et al.  RRTX: Real-Time Motion Planning/Replanning for Environments with Unpredictable Obstacles , 2014, WAFR.

[14]  Lydia Tapia,et al.  Dynamic Obstacle Avoidance with PEARL : PrEference Appraisal Reinforcement Learning , 2016 .

[15]  Victor Uc Cetina,et al.  Reinforcement learning in continuous state and action spaces , 2009 .

[16]  Lydia Tapia,et al.  Continuous action reinforcement learning for control-affine systems with unknown dynamics , 2014, IEEE/CAA Journal of Automatica Sinica.

[17]  Lydia Tapia,et al.  Stochastic Ensemble Simulation motion planning in stochastic dynamic environments , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  John Lygeros,et al.  A Stochastic Hybrid Model for Air Traffic Control Simulation , 2004, HSCC.

[19]  Maja J. Mataric,et al.  Motion planning using dynamic roadmaps , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[20]  Paolo Fiorini,et al.  Motion Planning in Dynamic Environments Using Velocity Obstacles , 1998, Int. J. Robotics Res..

[21]  D. Ernst,et al.  Approximate Value Iteration in the Reinforcement Learning Context. Application to Electrical Power System Control. , 2005 .

[22]  Jan Peters,et al.  A biomimetic approach to robot table tennis , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[24]  Thierry Siméon,et al.  A PRM-based motion planner for dynamically changing environments , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[25]  Kay Chen Tan,et al.  Evolutionary artificial potential fields and their application in real time robot path planning , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[26]  Dinesh Manocha,et al.  Reciprocal n-Body Collision Avoidance , 2011, ISRR.

[27]  Mike Stilman,et al.  Global Manipulation Planning in Robot Joint Space With Task Constraints , 2010, IEEE Transactions on Robotics.