Automated aerial suspended cargo delivery through reinforcement learning

Abstract Cargo-bearing unmanned aerial vehicles (UAVs) have tremendous potential to assist humans by delivering food, medicine, and other supplies. For time-critical cargo delivery tasks, UAVs need to be able to quickly navigate their environments and deliver suspended payloads with bounded load displacement. As a constraint balancing task for joint UAV-suspended load system dynamics, this task poses a challenge. This article presents a reinforcement learning approach for aerial cargo delivery tasks in environments with static obstacles. We first learn a minimal residual oscillations task policy in obstacle-free environments using a specifically designed feature vector for value function approximation that allows generalization beyond the training domain. The method works in continuous state and discrete action spaces. Since planning for aerial cargo requires very large action space (over 106 actions) that is impractical for learning, we define formal conditions for a class of robotics problems where learning can occur in a simplified problem space and successfully transfer to a broader problem space. Exploiting these guarantees and relying on the discrete action space, we learn the swing-free policy in a subspace several orders of magnitude smaller, and later develop a method for swing-free trajectory planning along a path. As an extension to tasks in environments with static obstacles where the load displacement needs to be bounded throughout the trajectory, sampling-based motion planning generates collision-free paths. Next, a reinforcement learning agent transforms these paths into trajectories that maintain the bound on the load displacement while following the collision-free path in a timely manner. We verify the approach both in simulation and in experiments on a quadrotor with suspended load and verify the method's safety and feasibility through a demonstration where a quadrotor delivers an open container of liquid to a human subject. The contributions of this work are two-fold. First, this article presents a solution to a challenging, and vital problem of planning a constraint-balancing task for an inherently unstable non-linear system in the presence of obstacles. Second, AI and robotics researchers can both benefit from the provided theoretical guarantees of system stability on a class of constraint-balancing tasks that occur in very large action spaces.

[1]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[2]  B. Faverjon,et al.  Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[3]  Jean-Claude Latombe,et al.  Randomized Kinodynamic Motion Planning with Moving Obstacles , 2002, Int. J. Robotics Res..

[4]  Raffaello D'Andrea,et al.  Optimization-based iterative learning for precise quadrocopter trajectory tracking , 2012, Autonomous Robots.

[5]  R. D’Andrea,et al.  Adaptive Open-Loop Aerobatic Maneuvers for Quadrocopters , 2011 .

[6]  Nancy M. Amato,et al.  A framework for planning motion in environments with moving obstacles , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Dominique Gruyer,et al.  Safe path planning in an uncertain-configuration space , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[8]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[9]  Vijay Kumar,et al.  Trajectory generation and control of a quadrotor with a cable-suspended load - A differentially-flat hybrid system , 2013, 2013 IEEE International Conference on Robotics and Automation.

[10]  Lydia Tapia,et al.  Learning swing-free trajectories for UAVs with a suspended load , 2013, 2013 IEEE International Conference on Robotics and Automation.

[11]  Vijay Kumar,et al.  The GRASP Multiple Micro-UAV Testbed , 2010, IEEE Robotics & Automation Magazine.

[12]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[13]  Konstantin Kondak,et al.  Generic slung load transportation system using small size helicopters , 2009, 2009 IEEE International Conference on Robotics and Automation.

[14]  Lydia Tapia,et al.  Construction and use of roadmaps that incorporate workspace modeling errors , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Syed Nasar,et al.  Electric Power Systems , 1972 .

[16]  Lydia E. Kavraki,et al.  Probabilistic roadmaps for path planning in high-dimensional configuration spaces , 1996, IEEE Trans. Robotics Autom..

[17]  D. Ernst,et al.  Approximate Value Iteration in the Reinforcement Learning Context. Application to Electrical Power System Control. , 2005 .

[18]  Nancy M. Amato,et al.  FIRM: Sampling-based feedback motion-planning under motion uncertainty and imperfect measurements , 2014, Int. J. Robotics Res..

[19]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[20]  Lydia Tapia,et al.  A reinforcement learning approach towards autonomous suspended load manipulation using aerial robots , 2013, 2013 IEEE International Conference on Robotics and Automation.

[21]  Lydia Tapia,et al.  Continuous action reinforcement learning for control-affine systems with unknown dynamics , 2014, IEEE/CAA Journal of Automatica Sinica.

[22]  Vijay Kumar,et al.  Trajectory Generation and Control for Precise Aggressive Maneuvers with Quadrotors , 2010, ISER.

[23]  Csaba Szepesvári,et al.  Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.

[24]  Geoffrey J. Gordon,et al.  Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.

[25]  Jae-Bok Song,et al.  Path Planning for a Robot Manipulator based on Probabilistic Roadmap and Reinforcement Learning , 2007 .

[26]  Thierry Siméon,et al.  The Stochastic Motion Roadmap: A Sampling Framework for Planning with Markov Motion Uncertainty , 2007, Robotics: Science and Systems.

[27]  Peter Stone,et al.  Behavior transfer for value-function-based reinforcement learning , 2005, AAMAS '05.

[28]  Norman S. Nise,et al.  Control Systems Engineering , 1991 .

[29]  Rafael Fierro,et al.  Agile Load Transportation : Safe and Efficient Load Manipulation with Aerial Robots , 2012, IEEE Robotics & Automation Magazine.

[30]  Vijay Kumar,et al.  Opportunities and challenges with autonomous micro aerial vehicles , 2012, Int. J. Robotics Res..

[31]  Rajeev Motwani,et al.  Path planning in expansive configuration spaces , 1997, Proceedings of International Conference on Robotics and Automation.

[32]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[33]  Peter Stone,et al.  Improving Action Selection in MDP's via Knowledge Transfer , 2005, AAAI.