Preference-balancing motion planning under stochastic disturbances

Physical stochastic disturbances, such as wind, often affect the motion of robots that perform complex tasks in real-world conditions. These disturbances pose a control challenge because resulting drift induces uncertainty and changes in the robot's speed and direction. This paper presents an online control policy based on supervised machine learning, Least Squares Axial Sum Policy Approximation (LSAPA), that generates trajectories for robotic preference-balancing tasks under stochastic disturbances. The task is learned offline with reinforcement learning, assuming no disturbances, and then trajectories are planned online in the presence of disturbances using the current observed information. We model the robot as a stochastic control-affine system with unknown dynamics impacted by a Gaussian process, and the task as a continuous Markov Decision Process. Replacing a traditional greedy policy, LSAPA works for high-dimensional control-affine systems impacted by stochastic disturbances and is linear in the input dimensionality. We verify the method for Swing-free Aerial Cargo Delivery and Rendezvous tasks. Results show that LSAPA selects an input an order of magnitude faster than comparative methods, rejecting a range of stochastic disturbances. Further, experiments on a quadrotor demonstrate that LSAPA trajectories that are suitable for physical systems.

[1]  Lydia Tapia,et al.  Continuous action reinforcement learning for control-affine systems with unknown dynamics , 2014, IEEE/CAA Journal of Automatica Sinica.

[2]  Ahmad A. Masoud,et al.  A harmonic potential field approach for planning motion of a UAV in a cluttered environment with a drift field , 2011, IEEE Conference on Decision and Control and European Control Conference.

[3]  Lydia Tapia,et al.  Learning swing-free trajectories for UAVs with a suspended load , 2013, 2013 IEEE International Conference on Robotics and Automation.

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Lars Grne,et al.  Nonlinear Model Predictive Control: Theory and Algorithms , 2011 .

[6]  Russ Tedrake,et al.  Robust Online Motion Planning with Regions of Finite Time Invariance , 2012, WAFR.

[7]  Emanuel Todorov,et al.  Stochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System , 2005, Neural Computation.

[8]  Michael L. Littman,et al.  Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.

[9]  Robert Babuska,et al.  Optimistic planning for continuous-action deterministic systems , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[10]  Cedric de Crousaz,et al.  Aggressive Optimal Control for Agile Flight with a Slung Load , 2014 .

[11]  Lydia Tapia,et al.  Planning Preference-balancing Motions with Stochastic Disturbances , 2014 .

[12]  Raffaello D'Andrea,et al.  Optimization-based iterative learning for precise quadrocopter trajectory tracking , 2012, Autonomous Robots.

[13]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[14]  Vijay Kumar,et al.  Dynamics, Control and Planning for Cooperative Manipulation of Payloads Suspended by Cables from Multiple Quadrotor Robots , 2013, Robotics: Science and Systems.

[15]  Anthony Tzes,et al.  Constrained-control of a quadrotor helicopter for trajectory tracking under wind-gust disturbances , 2010, Melecon 2010 - 2010 15th IEEE Mediterranean Electrotechnical Conference.

[16]  Csaba Szepesvári,et al.  Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.

[17]  Raffaello D'Andrea,et al.  A model predictive controller for quadrocopter state interception , 2013, 2013 European Control Conference (ECC).

[18]  Vijay Kumar,et al.  Stochastic differential equation-based exploration algorithm for autonomous indoor 3D exploration with a micro-aerial vehicle , 2012, Int. J. Robotics Res..

[19]  Hiroshi Kawano Study of path planning method for under-actuated blimp-type UAV in stochastic wind disturbance via augmented-MDP , 2011, 2011 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM).

[20]  Lydia Tapia,et al.  A reinforcement learning approach towards autonomous suspended load manipulation using aerial robots , 2013, 2013 IEEE International Conference on Robotics and Automation.

[21]  Doreen Meier,et al.  Introduction To Stochastic Control Theory , 2016 .

[22]  Thomas J. Walsh,et al.  Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.

[23]  L. Li,et al.  A new complexity bound for the least-squares problem☆ , 1996 .

[24]  Lydia Tapia,et al.  Reinforcement learning for balancing a flying inverted pendulum , 2014, Proceeding of the 11th World Congress on Intelligent Control and Automation.

[25]  Kimura Kimura Reinforcement learning in multi-dimensional state-action space using random rectangular coarse coding and gibbs sampling , 2007, SICE Annual Conference 2007.

[26]  Aleksandra Faust,et al.  Reinforcement learning and planning for preference balancing tasks , 2015, SIGAI.

[27]  Andrea Bonarini,et al.  Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods , 2007, NIPS.

[28]  Hadas Kress-Gazit,et al.  Guaranteeing reactive high-level behaviors for robots with complex dynamics , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  P. Olver Nonlinear Systems , 2013 .

[30]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[31]  Vijay Kumar,et al.  Trajectory generation and control of a quadrotor with a cable-suspended load - A differentially-flat hybrid system , 2013, 2013 IEEE International Conference on Robotics and Automation.