RLS Algorithms and Convergence Analysis Method for Online DLQR Control Design via Heuristic Dynamic Programming

In this paper, a method to design online optimal policies that encompasses Hamilton-Jacobi-Bellman (HJB) equation solution approximation and heuristic dynamic programming (HDP) approach is proposed. Recursive least squares (RLS) algorithms are developed to approximate the HJB equation solution that is supported by a sequence of greedy policies. The proposal investigates the convergence properties of a family of RLS algorithms and its numerical complexity in the context of reinforcement learning and optimal control. The algorithms are computationally evaluated in an electric circuit model that represents an MIMO dynamic system. The results presented herein emphasize the convergence behaviour of the RLS, projection and Kaczmarz algorithms that are developed for online applications.

[1]  Allan Kardec Barros,et al.  Estimators Based on Non-squares Loss Functions to Approximate HJB-Riccati Equation Solution for DLQR Design via HDP , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[2]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[3]  Damien Ernst,et al.  Using prior knowledge to accelerate online least-squares policy iteration , 2010, 2010 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR).

[4]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[5]  Richard Bellman,et al.  Dynamic Programming and Stochastic Control Processes , 1958, Inf. Control..

[6]  João Viana da Fonseca Neto,et al.  Convergence of the standard RLS method and UDUT factorisation of covariance matrix for solving the algebraic Riccati equation of the DLQR via heuristic approximate dynamic programming , 2015, Int. J. Syst. Sci..

[7]  Victor M. Becerra,et al.  Optimal control , 2008, Scholarpedia.

[8]  Frank L. Lewis,et al.  Optimal Control , 1986 .

[9]  Dimitri P. Bertsekas,et al.  Neuro-Dynamic Programming , 2009, Encyclopedia of Optimization.

[10]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[11]  João Viana da Fonseca Neto,et al.  Neural–Genetic Synthesis for State-Space Controllers Based on Linear Quadratic Regulator Design for Eigenstructure Assignment , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Patricia H. Moraes Rego,et al.  QR-TUNING AND APPROXIMATE-LS SOLUTIONS OF THE HJB EQUATION FOR ONLINE DLQR DESIGN VIA STATE AND ACTION-DEPENDENT HEURISTIC DYNAMIC PROGRAMMING , 2013 .

[13]  Allan Kardec Barros,et al.  An adaptive recursive algorithm based on non-quadratic function of the error , 2012, Signal Process..

[14]  Steven J. Bradtke,et al.  Incremental dynamic programming for on-line adaptive optimal control , 1995 .

[15]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[16]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[17]  George G. Lendaris,et al.  A retrospective on Adaptive Dynamic Programming for control , 2009, 2009 International Joint Conference on Neural Networks.

[18]  Bart De Schutter,et al.  Least-Squares Methods for Policy Iteration , 2012, Reinforcement Learning.

[19]  Peter Vrancx,et al.  Reinforcement Learning: State-of-the-Art , 2012 .