Convergence Analysis using non-squares estimators to approximate the solution of HJB-Riccati equation for the design DLQR via HDP

The proposed methodology is based on development of online algorithms for approximate solutions of the Hamilton-Jacobi-Bellman (HJB) equation through a family of non-squares approximators for critic adaptive solution of the Discrete Algebraic Riccati Equation (DARE), associated with the problem of Discrete Linear Quadratic Regulator (DLQR). The proposed method is evaluated in a multivariable dynamic system of 4th order with two inputs and it is compared with standard recursive least square algorithm.

[1]  Allan Kardec Barros,et al.  An adaptive recursive algorithm based on non-quadratic function of the error , 2012, Signal Process..

[2]  George G. Lendaris,et al.  A retrospective on Adaptive Dynamic Programming for control , 2009, 2009 International Joint Conference on Neural Networks.

[3]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4]  Frank L. Lewis,et al.  Optimal Control: Lewis/Optimal Control 3e , 2012 .

[5]  Paul J. Webros A menu of designs for reinforcement learning over time , 1990 .

[6]  Peter A. Beling,et al.  Decentralized Bayesian Search Using Approximate Dynamic Programming Methods , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[8]  John Seiffertt,et al.  An alpha derivative formulation of the Hamilton-Jacobi-Bellman equation Of Dynamic Programming , 2009, 2009 International Joint Conference on Neural Networks.

[9]  M. Haack Part B , 1942 .

[10]  João Viana da Fonseca Neto,et al.  Neural–Genetic Synthesis for State-Space Controllers Based on Linear Quadratic Regulator Design for Eigenstructure Assignment , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Huai-Ning Wu,et al.  Online adaptive optimal control for bilinear systems , 2012, 2012 American Control Conference (ACC).

[12]  F. Lewis,et al.  Continuous-Time ADP for Linear Systems with Partially Unknown Dynamics , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[13]  João Viana da Fonseca Neto,et al.  Convergence of the standard RLS method and UDUT factorisation of covariance matrix for solving the algebraic Riccati equation of the DLQR via heuristic approximate dynamic programming , 2015, Int. J. Syst. Sci..

[14]  Bernard Widrow,et al.  The least mean fourth (LMF) adaptive algorithm and its family , 1984, IEEE Trans. Inf. Theory.

[15]  Zvi Shiller,et al.  Optimal obstacle avoidance based on the Hamilton-Jacobi-Bellman equation , 1994, IEEE Trans. Robotics Autom..

[16]  Richard S. Sutton,et al.  Neural networks for control , 1990 .

[17]  Allan Kardec Barros,et al.  Estimators Based on Non-squares Loss Functions to Approximate HJB-Riccati Equation Solution for DLQR Design via HDP , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[18]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.