Supervisory Output Prediction for Bilinear Systems by Reinforcement Learning

Online output prediction is an indispensable part of any model predictive control implementation, especially when simplifications of the underlying physical model have been considered and/or the operating conditions change quite often. Furthermore, the selection of an output prediction model is strongly related to the data available, while designing/altering the data collection process may not be an option. Thus, in several scenarios, selecting the most appropriate prediction model needs to be performed during runtime. To this end, this paper introduces a supervisory output prediction scheme, tailored specifically for input-output stable bilinear systems, that intends on automating the process of selecting the most appropriate prediction model during runtime. The selection process is based upon a reinforcement-learning scheme, where prediction models are selected according to their prior prediction performance. An additional selection process is concerned with appropriately partitioning the control-inputs' domain in order to also allow for switched-system approximations of the original bilinear dynamics. We show analytically that the proposed scheme converges (in probability) to the best model and partition. We finally demonstrate these properties through simulations of temperature prediction in residential buildings.

[1]  Jeff S. Shamma,et al.  Switching Supervisory Control Using Calibrated Forecasts , 2007, IEEE Transactions on Automatic Control.

[2]  Paulo J. Lopes dos Santos,et al.  Identification of Bilinear Systems With White Noise Inputs: An Iterative Deterministic-Stochastic Subspace Approach , 2009, IEEE Transactions on Control Systems Technology.

[3]  Ronald R. Mohler,et al.  Natural Bilinear Control Processes , 1970, IEEE Trans. Syst. Sci. Cybern..

[4]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[5]  Michel Verhaegen,et al.  Subspace identification of multivariable linear parameter-varying systems , 2002, Autom..

[6]  Jeff S. Shamma,et al.  Nonconvergence to saddle boundary points under perturbed reinforcement learning , 2015, Int. J. Game Theory.

[7]  Bo Wahlberg,et al.  Physics-based modeling and identification for HVAC systems? , 2013, 2013 European Control Conference (ECC).

[8]  M. Inagaki,et al.  Bilinear system identification by Volterra kernels estimation , 1984 .

[9]  Panagiotis D. Christofides,et al.  Distributed model predictive control: A tutorial review and future research directions , 2013, Comput. Chem. Eng..

[10]  M Pardalos Panos,et al.  Optimization and Control of Bilinear Systems , 2008 .

[11]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[12]  L. Ljung,et al.  Recursive identification of bilinear systems , 1987 .

[13]  Eduardo Sontag Nonlinear regulation: The piecewise linear approach , 1981 .

[14]  Helen Durand,et al.  On identification of well‐conditioned nonlinear systems: Application to economic model predictive control of nonlinear processes , 2015 .

[15]  L. del Re,et al.  On identification and control of output-bilinear systems , 2001 .

[16]  Shengwei Wang,et al.  Multiple ARMAX modeling scheme for forecasting air conditioning system performance , 2007 .

[17]  Thomas Natschläger,et al.  Regression Models for Output Prediction of Thermal Dynamics in Buildings , 2016, ArXiv.

[18]  B. Marx,et al.  Nonlinear system identification using heterogeneous multiple models , 2013, Int. J. Appl. Math. Comput. Sci..

[19]  Michel Verhaegen,et al.  Maximum likelihood identification of multivariable bilinear state-space systems by projected gradient search , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[20]  Thomas Natschläger,et al.  Supervisory system identification for bilinear systems with application to thermal dynamics in buildings , 2014, 2014 IEEE International Symposium on Intelligent Control (ISIC).

[21]  Brett Ninness,et al.  Maximum-likelihood parameter estimation of bilinear systems , 2005, IEEE Transactions on Automatic Control.

[22]  Nicolas Petit,et al.  Thermal building model identification using time-scaled identification methods , 2010, 49th IEEE Conference on Decision and Control (CDC).

[23]  Jeff S. Shamma,et al.  Distributed Dynamic Reinforcement of Efficient Outcomes in Multiagent Coordination and Network Formation , 2011, Dynamic Games and Applications.

[24]  Ravindra D. Gudi,et al.  Identification of complex nonlinear processes based on fuzzy decomposition of the steady state space , 2003 .

[25]  Michael Athans,et al.  Guaranteed properties of gain scheduled control for linear parameter-varying plants , 1991, Autom..

[26]  Alex Simpkins,et al.  System Identification: Theory for the User, 2nd Edition (Ljung, L.; 1999) [On the Shelf] , 2012, IEEE Robotics & Automation Magazine.

[27]  Bart De Moor,et al.  Subspace identification of bilinear systems subject to white inputs , 1999, IEEE Trans. Autom. Control..