PEARL: PrEference Appraisal Reinforcement Learning for Motion Planning

Robot motion planning often requires finding trajectories that balance different user intents, or preferences. One of these preferences is usually arrival at the goal, while another might be obstacle avoidance. Here, we formalize these, and similar, tasks as preference balancing tasks (PBTs) on acceleration controlled robots, and propose a motion planning solution, PrEference Appraisal Reinforcement Learning (PEARL). PEARL uses reinforcement learning on a restricted training domain, combined with features engineered from user-given intents. PEARL's planner then generates trajectories in expanded domains for more complex problems. We present an adaptation for rejection of stochastic disturbances and offer in-depth analysis, including task completion conditions and behavior analysis when the conditions do not hold. PEARL is evaluated on five problems, two multi-agent obstacle avoidance tasks and three that stochastically disturb the system at run-time: 1) a multi-agent pursuit problem with 1000 pursuers, 2) robot navigation through 900 moving obstacles, which is is trained with in an environment with only 4 static obstacles, 3) aerial cargo delivery, 4) two robot rendezvous, and 5) flying inverted pendulum. Lastly, we evaluate the method on a physical quadrotor UAV robot with a suspended load influenced by a stochastic disturbance. The video, this https URL contains the experiments and visualization of the simulations.

[1]  Mike Stilman,et al.  Manipulation planning with soft task constraints , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Lydia Tapia,et al.  Preference-balancing motion planning under stochastic disturbances , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Mike Stilman,et al.  Global Manipulation Planning in Robot Joint Space With Task Constraints , 2010, IEEE Transactions on Robotics.

[4]  Lydia Tapia,et al.  Learning swing-free trajectories for UAVs with a suspended load , 2013, 2013 IEEE International Conference on Robotics and Automation.

[5]  Hadas Kress-Gazit,et al.  Guaranteeing reactive high-level behaviors for robots with complex dynamics , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Lydia Tapia,et al.  Safety, Challenges, and Performance of Motion Planners in Dynamic Environments , 2017, ISRR.

[7]  P. Olver Nonlinear Systems , 2013 .

[8]  Rodrigo Benenson,et al.  Integrating Perception and Planning for Autonomous Navigation of Urban Vehicles , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Hiroshi Kawano Study of path planning method for under-actuated blimp-type UAV in stochastic wind disturbance via augmented-MDP , 2011, 2011 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM).

[10]  Lydia Tapia,et al.  Reinforcement learning for balancing a flying inverted pendulum , 2014, Proceeding of the 11th World Congress on Intelligent Control and Automation.

[11]  David Angeli,et al.  On convergence of averagely constrained economic MPC and necessity of dissipativity for optimal steady-state operation , 2013, 2013 American Control Conference.

[12]  Xiaoming Hu,et al.  A control Lyapunov function approach to multiagent coordination , 2002, IEEE Trans. Robotics Autom..

[13]  Songhao Piao,et al.  Coalition Formation for Multi-agent Pursuit Based on Neural Network , 2019, J. Intell. Robotic Syst..

[14]  Kay Chen Tan,et al.  Evolutionary artificial potential fields and their application in real time robot path planning , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[15]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[16]  Anthony Tzes,et al.  Constrained-control of a quadrotor helicopter for trajectory tracking under wind-gust disturbances , 2010, Melecon 2010 - 2010 15th IEEE Mediterranean Electrotechnical Conference.

[17]  Lin Padgham,et al.  Situational preferences for BDI plans , 2013, AAMAS.

[18]  Vijay Kumar,et al.  Dynamics, Control and Planning for Cooperative Manipulation of Payloads Suspended by Cables from Multiple Quadrotor Robots , 2013, Robotics: Science and Systems.

[19]  Lydia Tapia,et al.  Automated aerial suspended cargo delivery through reinforcement learning , 2017, Artif. Intell..

[20]  Craig W. Reynolds Flocks, herds, and schools: a distributed behavioral model , 1998 .

[21]  Michael L. Littman,et al.  Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.

[22]  Cheng Wu,et al.  Novel Function Approximation Techniques for Large-scale Reinforcement Learning , 2010 .

[23]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[24]  Eduardo Sontag A universal construction of Artstein's theorem on nonlinear stabilization , 1989 .

[25]  Lydia Tapia,et al.  Continuous action reinforcement learning for control-affine systems with unknown dynamics , 2014, IEEE/CAA Journal of Automatica Sinica.

[26]  Russ Tedrake,et al.  Robust Online Motion Planning with Regions of Finite Time Invariance , 2012, WAFR.

[27]  Vijay Kumar,et al.  Autonomous indoor 3D exploration with a micro-aerial vehicle , 2012, 2012 IEEE International Conference on Robotics and Automation.

[28]  Dinesh Manocha,et al.  Reciprocal n-Body Collision Avoidance , 2011, ISRR.

[29]  Paolo Fiorini,et al.  Motion Planning in Dynamic Environments Using Velocity Obstacles , 1998, Int. J. Robotics Res..

[30]  Li-Chen Fu,et al.  Human-Centered Robot Navigation—Towards a Harmoniously Human–Robot Coexisting Environment , 2011, IEEE Transactions on Robotics.

[31]  Ahmad A. Masoud,et al.  A harmonic potential field approach for planning motion of a UAV in a cluttered environment with a drift field , 2011, IEEE Conference on Decision and Control and European Control Conference.

[32]  Lydia Tapia,et al.  PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Lydia Tapia,et al.  Avoiding moving obstacles with stochastic hybrid dynamics using PEARL: PrEference Appraisal Reinforcement Learning , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Lydia Tapia,et al.  A reinforcement learning approach towards autonomous suspended load manipulation using aerial robots , 2013, 2013 IEEE International Conference on Robotics and Automation.

[35]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[36]  Raffaello D'Andrea,et al.  Optimization-based iterative learning for precise quadrocopter trajectory tracking , 2012, Autonomous Robots.

[37]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Shuzhi Sam Ge,et al.  Dynamic Motion Planning for Mobile Robots Using Potential Field Method , 2002, Auton. Robots.

[39]  Vijay Kumar,et al.  Stochastic differential equation-based exploration algorithm for autonomous indoor 3D exploration with a micro-aerial vehicle , 2012, Int. J. Robotics Res..

[40]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[41]  Vijay Kumar,et al.  Trajectory generation and control of a quadrotor with a cable-suspended load - A differentially-flat hybrid system , 2013, 2013 IEEE International Conference on Robotics and Automation.

[42]  Alexander Kleiner,et al.  Multi-UAV motion planning for guaranteed search , 2013, AAMAS.

[43]  Russ Tedrake,et al.  Path planning in 1000+ dimensions using a task-space Voronoi bias , 2009, 2009 IEEE International Conference on Robotics and Automation.

[44]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[45]  Lydia Tapia,et al.  Dynamic risk tolerance: Motion planning by balancing short-term and long-term stochastic dynamic predictions , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[46]  K. Åström Introduction to Stochastic Control Theory , 1970 .

[47]  D. Ernst,et al.  Approximate Value Iteration in the Reinforcement Learning Context. Application to Electrical Power System Control. , 2005 .

[48]  Petter Ögren,et al.  A control Lyapunov function approach to multi-agent coordination , 2001 .

[49]  Sam Devlin,et al.  Potential-based difference rewards for multiagent reinforcement learning , 2014, AAMAS.

[50]  Jonas Buchli,et al.  Unified motion control for dynamic quadrotor maneuvers demonstrated on slung load and rotor failure tasks , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[51]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[52]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[53]  Lars Grne,et al.  Nonlinear Model Predictive Control: Theory and Algorithms , 2011 .