A hybrid learning method for system identification and optimal control

We present a three-step method to perform system identification and optimal control of nonlinear systems. Our approach is mainly data-driven and does not require active excitation of the system to perform system identification. In particular, it is designed for systems for which only historical data under closed-loop control are available and where historical control commands exhibit low variability. In the first step, simple simulation models of the system are built and run under various conditions. In the second step, a neural network architecture is extensively trained on the simulation outputs to learn the system physics and retrained with historical data from the real system with stopping rules. These constraints avoid overfitting that arises by fitting closed-loop controlled systems. By doing so, we obtain one (or many) system model(s), represented by this architecture, whose behavior can be chosen to match more or less the real system. Finally, state-of-the-art reinforcement learning with a variant of domain randomization and distributed learning is used for optimal control of the system. We first illustrate the model identification strategy with a simple example, the pendulum with external torque. We then apply our method to model and optimize the control of a large building facility located in Switzerland. Simulation results demonstrate that this approach generates stable functional controllers that outperform on comfort and energy benchmark rule-based controllers.

[1]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[2]  Stefano Di Cairano,et al.  An Industry Perspective on MPC in Large Volumes Applications: Potential Benefits and Open Challenges , 2012 .

[3]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[4]  R. Britter,et al.  A resistance-capacitance network model for the analysis of the interactions between the energy performance of buildings and the urban climate , 2012 .

[5]  Steven L. Brunton,et al.  Deep Learning and Model Predictive Control for Self-Tuning Mode-Locked Lasers , 2017, ArXiv.

[6]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[8]  Bart De Moor,et al.  Subspace Identification for Linear Systems: Theory ― Implementation ― Applications , 2011 .

[9]  Alberto Bemporad,et al.  Model Predictive Control (MPC) for Enhancing Building and HVAC System Energy Efficiency: Problem Formulation, Applications and Opportunities , 2018 .

[10]  Ross A. Knepper,et al.  DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[11]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[12]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[13]  Sergey Levine,et al.  When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.

[14]  Steven L. Brunton,et al.  Deep Model Predictive Control with Online Learning for Complex Physical Systems , 2019, ArXiv.

[15]  Yangsheng Xu,et al.  Neural network approach to control system identification with variable activation functions , 1994, Proceedings of 1994 9th IEEE International Symposium on Intelligent Control.

[16]  Bernhard Schölkopf,et al.  Causal Inference on Time Series using Restricted Structural Equation Models , 2013, NIPS.

[17]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  Steve B. Jiang,et al.  Nonlinear Systems Identification Using Deep Dynamic Neural Networks , 2016, ArXiv.

[20]  Christian Inard,et al.  THERMAL BUILDING MODELLING ADAPTED TO DISTRICT ENERGY SIMULATION , 2016 .

[21]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[22]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[23]  Wen Yu,et al.  Non-linear system modeling using LSTM neural networks , 2018 .

[24]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[26]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[27]  Petre Stoica,et al.  Decentralized Control , 2018, The Control Systems Handbook.

[28]  Le Song,et al.  Smoothed Dual Embedding Control , 2017, ArXiv.

[29]  David Q. Mayne,et al.  Model predictive control: Recent developments and future promise , 2014, Autom..

[30]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[31]  W. P. Jones,et al.  Air Conditioning Engineering , 1967 .

[32]  Steven L. Brunton,et al.  Data-driven discovery of coordinates and governing equations , 2019, Proceedings of the National Academy of Sciences.

[33]  Antonio Liotta,et al.  On-Line Building Energy Optimization Using Deep Reinforcement Learning , 2017, IEEE Transactions on Smart Grid.

[34]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[35]  OpenAI Learning Dexterous In-Hand Manipulation. , 2018 .

[36]  Suvrit Sra,et al.  Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity , 2019, ICLR.

[37]  Le Song,et al.  SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.

[38]  Biao Huang,et al.  System Identification , 2000, Control Theory for Physicists.

[39]  Rik Pintelon,et al.  Linear System Identification in a Nonlinear Setting: Nonparametric Analysis of the Nonlinear Distortions and Their Impact on the Best Linear Approximation , 2016, IEEE Control Systems.

[40]  Lennart Ljung,et al.  Nonlinear System Identification: A User-Oriented Road Map , 2019, IEEE Control Systems.

[41]  Lorenzo Natale,et al.  Learning latent state representation for speeding up exploration , 2019, ArXiv.

[42]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[43]  Qi Cai,et al.  Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.

[44]  Damien Picard,et al.  Approximate model predictive building control via machine learning , 2018 .

[45]  Davide Borelli,et al.  A State-Space Analysis of a Single Zone Building Considering Solar Radiation, Internal Radiation, and PCM Effects , 2019 .

[46]  Marko Bacic,et al.  Model predictive control , 2003 .

[47]  Hossein Afshari,et al.  Field tests of an adaptive, model-predictive heating controller for residential buildings , 2015 .

[48]  Alberto Bemporad,et al.  Predictive Control for Linear and Hybrid Systems , 2017 .

[49]  Petros Koumoutsakos,et al.  Efficient collective swimming by harnessing vortices through deep reinforcement learning , 2018, Proceedings of the National Academy of Sciences.