A reinforcement learning approach towards autonomous suspended load manipulation using aerial robots

In this paper, we present a problem where a suspended load, carried by a rotorcraft aerial robot, performs trajectory tracking. We want to accomplish this by specifying the reference trajectory for the suspended load only. The aerial robot needs to discover/learn its own trajectory which ensures that the suspended load tracks the reference trajectory. As a solution, we propose a method based on least-square policy iteration (LSPI) which is a type of reinforcement learning algorithm. The proposed method is verified through simulation and experiments.

[1]  R. Bellman Dynamic programming. , 1957, Science.

[2]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[3]  Jonathan P. How,et al.  Mission Health Management for 24/7 Persistent Surveillance Operations , 2007 .

[4]  Ronald Lumia,et al.  Rapid Transport of Suspended Payloads , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[5]  Warren B. Powell,et al.  A convergent recursive least squares approximate policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[6]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[7]  Angela Scḧollig,et al.  A Platform for Dance Performances with Multiple Quadrocopters , 2010 .

[8]  Warrren B Powell,et al.  A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications , 2011 .

[9]  Rafael Fierro,et al.  Agile Load Transportation : Safe and Efficient Load Manipulation with Aerial Robots , 2012, IEEE Robotics & Automation Magazine.

[10]  Roland Siegwart,et al.  Design and control of an indoor micro quadrotor , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[11]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[12]  Rafael Fierro,et al.  Trajectory generation for swing-free maneuvers of a quadrotor with suspended payload: A dynamic programming approach , 2012, 2012 IEEE International Conference on Robotics and Automation.

[13]  Raffaello D'Andrea,et al.  A simple learning strategy for high-speed quadrocopter multi-flips , 2010, 2010 IEEE International Conference on Robotics and Automation.

[14]  Claire J. Tomlin,et al.  Quadrotor Helicopter Trajectory Tracking Control , 2008 .

[15]  Ronald Lumia,et al.  Rapid Swing-Free Transport of Nonlinear Payloads Using Dynamic Programming , 2008 .

[16]  Warrren B Powell,et al.  Convergence Analysis of On-Policy LSPI for Multi-Dimensional Continuous State and Action-Space MDPs and Extension with Orthogonal Polynomial Approximation , 2010 .

[17]  Vijay Kumar,et al.  Trajectory generation and control for precise aggressive maneuvers with quadrotors , 2012, Int. J. Robotics Res..