Approximate Dynamic Programming for Optimal Stationary Control With Control-Dependent Noise

This brief studies the stochastic optimal control problem via reinforcement learning and approximate/adaptive dynamic programming (ADP). A policy iteration algorithm is derived in the presence of both additive and multiplicative noise using Itô calculus. The expectation of the approximated cost matrix is guaranteed to converge to the solution of some algebraic Riccati equation that gives rise to the optimal cost value. Moreover, the covariance of the approximated cost matrix can be reduced by increasing the length of time interval between two consecutive iterations. Finally, a numerical example is given to illustrate the efficiency of the proposed ADP methodology.

[1]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[3]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[4]  Approximate dynamic programming for output feedback control , 2010, Proceedings of the 29th Chinese Control Conference.

[5]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[6]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[7]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Xin Xu,et al.  Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[10]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[11]  D. Kleinman,et al.  Optimal stationary control of linear systems with control-dependent noise , 1969 .

[12]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[13]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[14]  Dimitri P. Bertsekas,et al.  Basis function adaptation methods for cost approximation in MDP , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[15]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[17]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[18]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[19]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .