Data Driven Control with Learned Dynamics: Model-Based versus Model-Free Approach

This paper compares two different types of data-driven control methods, representing model-based and model-free approaches. One is a recently proposed method - Deep Koopman Representation for Control (DKRC), which utilizes a deep neural network to map an unknown nonlinear dynamical system to a high-dimensional linear system, which allows for employing state-of-the-art control strategy. The other one is a classic model-free control method based on an actor-critic architecture - Deep Deterministic Policy Gradient (DDPG), which has been proved to be effective in various dynamical systems. The comparison is carried out in OpenAI Gym, which provides multiple control environments for benchmark purposes. Two examples are provided for comparison, i.e., classic Inverted Pendulum and Lunar Lander Continuous Control. From the results of the experiments, we compare these two methods in terms of control strategies and the effectiveness under various initialization conditions. We also examine the learned dynamic model from DKRC with the analytical model derived from the Euler-Lagrange Linearization method, which demonstrates the accuracy in the learned model for unknown dynamics from a data-driven sample-efficient approach.

[1]  S. Brunton,et al.  Data-Driven Approximations of Dynamical Systems Operators for Control , 2019, Lecture Notes in Control and Information Sciences.

[2]  Karl Tuyls,et al.  The importance of experience replay database composition in deep reinforcement learning , 2015 .

[3]  Pengcheng You,et al.  Deep Koopman Controller Synthesis for Cyber-Resilient Market-Based Frequency Regulation , 2018 .

[4]  Diederik P. Kingma,et al.  An Introduction to Variational Autoencoders , 2019, Found. Trends Mach. Learn..

[5]  Matthew O. Williams,et al.  A Kernel-Based Approach to Data-Driven Koopman Spectral Analysis , 2014, 1411.2260.

[6]  Umesh Vaidya,et al.  Optimal Quadratic Regulation of Nonlinear System Using Koopman Operator , 2019, 2019 American Control Conference (ACC).

[7]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[8]  B. O. Koopman,et al.  Hamiltonian Systems and Transformation in Hilbert Space. , 1931, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Steven L. Brunton,et al.  Deep learning for universal linear embeddings of nonlinear dynamics , 2017, Nature Communications.

[10]  Clarence W. Rowley,et al.  A Data–Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition , 2014, Journal of Nonlinear Science.

[11]  Todd D. Murphey,et al.  Learning Models for Shared Control of Human-Machine Systems with Unknown Dynamics , 2017, Robotics: Science and Systems.

[12]  Steven L. Brunton,et al.  Dynamic Mode Decomposition with Control , 2014, SIAM J. Appl. Dyn. Syst..

[13]  Umesh Vaidya,et al.  Deep Learning of Koopman Representation for Control , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[14]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[16]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[17]  Louis Wehenkel,et al.  Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  Igor Mezic,et al.  Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control , 2016, Autom..

[19]  Olivier Sigaud,et al.  Actor-critic versus direct policy search: a comparison based on sample complexity , 2016, ArXiv.

[20]  Azim Eskandarian,et al.  Handbook of Intelligent Vehicles , 2012 .

[21]  Hans Joachim Ferreau,et al.  Efficient Numerical Methods for Nonlinear MPC and Moving Horizon Estimation , 2009 .

[22]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[23]  Nasser L. Azad,et al.  Comparison of Deep Reinforcement Learning and Model Predictive Control for Adaptive Cruise Control , 2019, ArXiv.

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Steven L. Brunton,et al.  Data-driven discovery of Koopman eigenfunctions for control , 2017, Mach. Learn. Sci. Technol..

[26]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[27]  Clarence W. Rowley,et al.  Extending Data-Driven Koopman Analysis to Actuated Systems , 2016 .