Inverse Data-Driven Optimal Control for Nonlinear Stochastic Non-stationary Systems

We consider the problem of estimating the possibly non-convex cost of an agent by observing its interactions with a nonlinear, non-stationary and stochastic environment. For this inverse problem, we give a result that allows to estimate the cost by solving a convex optimization problem. To obtain this result we also tackle a forward problem. This leads to formulate a finite-horizon optimal control problem for which we show convexity and find the optimal solution. Our approach leverages certain probabilistic descriptions that can be obtained both from data and/or from first-principles. The effectiveness of our results, which are turned in an algorithm, is illustrated via simulations on the problem of estimating the cost of an agent that is stabilizing the unstable equilibrium of a pendulum.

[1]  L. Rodrigues Inverse Optimal Control with Discount Factor for Continuous and Discrete-Time Control-Affine Systems and Reinforcement Learning , 2022, 2022 IEEE 61st Conference on Decision and Control (CDC).

[2]  F. Lewis,et al.  Inverse reinforcement learning for multi-player noncooperative apprentice games , 2022, Autom..

[3]  R. Kamalapurkar,et al.  Model-based inverse reinforcement learning for deterministic systems , 2022, Autom..

[4]  Émiland Garrabé,et al.  Probabilistic design of optimal sequential decision-making algorithms in learning and control , 2022, Annu. Rev. Control..

[5]  Mac Schwager,et al.  Maximum-Entropy Multi-Agent Dynamic Games: Forward and Inverse Solutions , 2021, IEEE Transactions on Robotics.

[6]  Roy S. Smith,et al.  Maximum Likelihood Estimation in Data-Driven Modeling and Control , 2020, IEEE Transactions on Automatic Control.

[7]  Mohsen Davoudi,et al.  From inverse optimal control to inverse reinforcement learning: A historical review , 2020, Annu. Rev. Control..

[8]  Yumiharu Nakano Inverse stochastic optimal controls , 2020, Autom..

[9]  Giovanni Russo,et al.  On a probabilistic approach to synthesize control policies from example datasets , 2020, Autom..

[10]  Sean P. Meyn,et al.  Kullback-Leibler-Quadratic Optimal Control of Flexible Power Demand , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[11]  Lillian J. Ratliff,et al.  Inverse Risk-Sensitive Reinforcement Learning , 2017, IEEE Transactions on Automatic Control.

[12]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[13]  Rebecca Willett,et al.  Online Markov Decision Processes With Kullback–Leibler Control Cost , 2014, IEEE Transactions on Automatic Control.

[14]  Stefan Schaal,et al.  Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[15]  Sergey Levine,et al.  Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[16]  Sergey Levine,et al.  Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[17]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[18]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[19]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[20]  J. A. Bryson Optimal control-1950 to 1985 , 1996 .

[21]  Chun. Loo,et al.  BAYESIAN APPROACH TO SYSTEM IDENTIFICATION , 1981 .