Hamiltonian-Driven Adaptive Dynamic Programming Based on Extreme Learning Machine

In this paper, a novel frame work of reinforcement learning for continuous time dynamical system is presented based on the Hamiltonian functional and extreme learning machine. The idea of solution search in the optimization is introduced to find the optimal control policy in the optimal control problem. The optimal control search consists of three steps: evaluation, comparison and improvement of arbitrary admissible policy. The Hamiltonian functional plays an important role in the above framework, under which only one critic is required in the adaptive critic structure. The critic network is implemented by the extreme learning machine. Finally, simulation study is conducted to verify the effectiveness of the presented algorithm.

[1]  Bin Jiang,et al.  Online Adaptive Policy Learning Algorithm for $H_{\infty }$ State Feedback Control of Unknown Affine Nonlinear Discrete-Time Systems , 2014, IEEE Transactions on Cybernetics.

[2]  Randal W. Beard,et al.  Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation , 1997, Autom..

[3]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[4]  Frank L. Lewis,et al.  Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..

[5]  Yixin Yin,et al.  Hamiltonian-Driven Adaptive Dynamic Programming for Continuous Nonlinear Dynamical Systems , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[6]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[7]  George N. Saridis,et al.  An Approximation Theory of Optimal Control for Trainable Manipulators , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[9]  Huaguang Zhang,et al.  Adaptive Dynamic Programming for a Class of Complex-Valued Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[11]  Frank L. Lewis,et al.  $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Derong Liu,et al.  Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Cybernetics.

[13]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[14]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[15]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.