Bridging Reinforcement Learning and Iterative Learning Control: Autonomous Reference Tracking for Unknown, Nonlinear Dynamics

This work addresses the problem of reference tracking in autonomously learning agents with unknown, nonlinear dynamics. Existing solutions require model information or extensive parameter tuning, and have rarely been validated in real-world experiments. We propose a learning control scheme that learns to approximate the unknown dynamics by a Gaussian Process (GP), which is used to optimize and apply a feedforward control input on each trial. Unlike existing approaches, the proposed method neither requires knowledge of the system states and their dynamics nor knowledge of an effective feedback control structure. All algorithm parameters are chosen automatically, i.e. the learning method works plug and play. The proposed method is validated in extensive simulations and real-world experiments. In contrast to most existing work, we study learning dynamics for more than one motion task as well as the robustness of performance across a large range of learning parameters. The method’s plug and play applicability is demonstrated by experiments with a balancing robot, in which the proposed method rapidly learns to track the desired output. Due to its model-agnostic and plug and play properties, the proposed method is expected to have high potential for application to a large class of reference tracking problems in systems with unknown, nonlinear dynamics.

[1]  Marc Peter Deisenroth,et al.  Efficient reinforcement learning using Gaussian processes , 2010 .

[2]  Stefan Palis,et al.  Robust control for active damping of elastic gantry crane vibrations , 2019, Mechanical Systems and Signal Processing.

[3]  Suguru Arimoto,et al.  Bettering operation of Robots by learning , 1984, J. Field Robotics.

[4]  Wei Meng,et al.  High-Order Model-Free Adaptive Iterative Learning Control of Pneumatic Artificial Muscle With Enhanced Convergence , 2020, IEEE Transactions on Industrial Electronics.

[5]  Klaus-Dieter Kuhnert,et al.  Robust adaptive control of nonholonomic mobile robot with parameter and nonparameter uncertainties , 2005, IEEE Transactions on Robotics.

[6]  Joonho Lee,et al.  DeepGait: Planning and Control of Quadrupedal Gaits Using Deep Reinforcement Learning , 2020, IEEE Robotics and Automation Letters.

[7]  Frank L. Lewis,et al.  Adaptive Dynamic Programming for feedback control , 2009, 2009 7th Asian Control Conference.

[8]  P. Brett,et al.  An autonomous surgical robot for drilling a cochleostomy: preliminary porcine trial , 2008, Clinical otolaryngology : official journal of ENT-UK ; official journal of Netherlands Society for Oto-Rhino-Laryngology & Cervico-Facial Surgery.

[9]  Jan Peters,et al.  Stability of Controllers for Gaussian Process Dynamics , 2017, J. Mach. Learn. Res..

[10]  SangJoo Kwon,et al.  Dynamic modeling of a two-wheeled inverted pendulum balancing mobile robot , 2015 .

[11]  Jörg Raisch,et al.  Iterative learning control of a drop foot neuroprosthesis — Generating physiological foot motion in paretic gait by automatic feedback control , 2016 .

[12]  Xiaobing Kong,et al.  Iterative Learning Model Predictive Control Based on Iterative Data-Driven Modeling , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[14]  Alberto Rodriguez,et al.  TossingBot: Learning to Throw Arbitrary Objects With Residual Physics , 2019, IEEE Transactions on Robotics.

[15]  Kevin L. Moore,et al.  Iterative Learning Control: Brief Survey and Categorization , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[17]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[18]  Furong Gao,et al.  Nonlinear Monotonically Convergent Iterative Learning Control for Batch Processes , 2018, IEEE Transactions on Industrial Electronics.

[19]  Raffaello D'Andrea,et al.  Real-time trajectory generation for interception maneuvers with quadrocopters , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Xuhui Bu,et al.  RBFNN-Based Data-Driven Predictive Iterative Learning Control for Nonaffine Nonlinear Systems , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Raffaello D'Andrea,et al.  Iterative learning of feed-forward corrections for high-performance tracking , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Taylor Apgar,et al.  Fast Online Trajectory Optimization for the Bipedal Robot Cassie , 2018, Robotics: Science and Systems.

[23]  Duy Nguyen-Tuong,et al.  Stability of Controllers for Gaussian Process Forward Models , 2016, ICML.

[24]  Christopher G. Atkeson,et al.  Optimization based full body control for the atlas robot , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[25]  Juraj Kabzan,et al.  Cautious Model Predictive Control Using Gaussian Process Regression , 2017, IEEE Transactions on Control Systems Technology.

[26]  Koushil Sreenath,et al.  Feedback Control of an Exoskeleton for Paraplegics: Toward Robustly Stable, Hands-Free Dynamic Walking , 2018, IEEE Control Systems.

[27]  Wei Zhang,et al.  Iterative Learning Control for discrete nonlinear systems with randomly iteration varying lengths , 2016, Syst. Control. Lett..

[28]  E. Rogers,et al.  Iterative learning control for discrete-time systems with exponential rate of convergence , 1996 .

[29]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[30]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[31]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[32]  Jan Peters,et al.  Local Gaussian process regression for real-time model-based robot control , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[33]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34]  Sandra Hirche,et al.  Localized active learning of Gaussian process state space models , 2020, L4DC.

[35]  Marek B. Zaremba,et al.  ROBUST ILC DESIGN IS STRAIGHTFORWARD FOR UNCERTAIN LTI SYSTEMS SATISFYING THE ROBUST PERFORMANCE CONDITION , 2002 .

[36]  Robin R. Murphy,et al.  Trial by fire [rescue robots] , 2004, IEEE Robotics & Automation Magazine.

[37]  Zhuo Wang,et al.  From model-based control to data-driven control: Survey, classification and perspective , 2013, Inf. Sci..