Goal-driven dynamics learning via Bayesian optimization

Real-world robots are becoming increasingly complex and commonly act in poorly understood environments where it is extremely challenging to model or learn their true dynamics. Therefore, it might be desirable to take a task-specific approach, wherein the focus is on explicitly learning the dynamics model which achieves the best control performance for the task at hand, rather than learning the true dynamics. In this work, we use Bayesian optimization in an active learning framework where a locally linear dynamics model is learned with the intent of maximizing the control performance, and used in conjunction with optimal control schemes to efficiently design a controller for a given task. This model is updated directly based on the performance observed in experiments on the physical system in an iterative manner until a desired performance is achieved. We demonstrate the efficacy of the proposed approach through simulations and real experiments on a quadrotor testbed.

[1]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[2]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[3]  Michael J. Grimble,et al.  Implicit and explicit LQG self-tuning controllers , 1984, Autom..

[4]  D. Clarke,et al.  A generalized LQG approach to self-tuning control Part I. Aspects of design , 1985 .

[5]  Alan J. Laub,et al.  The linear-quadratic optimal regulator for descriptor systems , 1985, 1985 24th IEEE Conference on Decision and Control.

[6]  S. Sastry,et al.  Adaptive Control: Stability, Convergence and Robustness , 1989 .

[7]  B. Pasik-Duncan,et al.  Adaptive Control , 1996, IEEE Control Systems.

[8]  Håkan Hjalmarsson,et al.  For model-based control design, closed-loop identification gives better performance , 1996, Autom..

[9]  Christopher G. Atkeson,et al.  Nonparametric Model-Based Reinforcement Learning , 1997, NIPS.

[10]  Daniel Sbarbaro,et al.  Nonlinear adaptive control using non-parametric Gaussian Process prior models , 2002 .

[11]  J. Lofberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004, 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508).

[12]  Michel Gevers,et al.  Identification for Control: From the Early Achievements to the Revival of Experiment Design , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[13]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[14]  Michael A. Osborne,et al.  Gaussian Processes for Global Optimization , 2008 .

[15]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[16]  Caleb Chamberlain,et al.  System Identification, State Estimation, and Control of Unmanned Aerial Robots , 2011 .

[17]  Ian R. Manchester,et al.  Feedback controller parameterizations for Reinforcement Learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[18]  Rini Akmeliawati,et al.  Parameter identification of an autonomous quadrotor , 2011, 2011 4th International Conference on Mechatronics (ICOM).

[19]  Jan Peters,et al.  Model learning for robot control: a survey , 2011, Cognitive Processing.

[20]  Alborz Geramifard,et al.  Reinforcement learning with misspecified model classes , 2013, 2013 IEEE International Conference on Robotics and Automation.

[21]  R. D’Andrea,et al.  A Self-Tuning LQR Approach Demonstrated on an Inverted Pendulum , 2014 .

[22]  Ruben Martinez-Cantin,et al.  BayesOpt: a Bayesian optimization library for nonlinear optimization, experimental design and bandits , 2014, J. Mach. Learn. Res..

[23]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Jan Peters,et al.  Bayesian optimization for learning gaits under uncertainty , 2015, Annals of Mathematics and Artificial Intelligence.

[26]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[27]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[28]  Claire J. Tomlin,et al.  Learning quadrotor dynamics using neural network for flight control , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[29]  Stefan Schaal,et al.  Automatic LQR tuning based on Gaussian process global optimization , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Sergey Levine,et al.  One-shot learning of manipulation skills with online dynamics adaptation and neural network priors , 2015, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Andreas Krause,et al.  Safe controller optimization for quadrotors with Gaussian processes , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Priya L. Donti,et al.  Task-based End-to-end Model Learning , 2017, ArXiv.

[33]  Priya L. Donti,et al.  Task-based End-to-end Model Learning in Stochastic Optimization , 2017, NIPS.