Two coupled neural-networks-based solution of the Hamilton-Jacobi-Bellman equation

Abstract: This work is aimed at looking into the determination of optimal neuro-feedback control for discrete time nonlinear systems. The basic idea consists in the use of two coupled neural networks to approximate the solution of the Hamilton-Jacobi-Bellman equation (HJB) and to obtain a robust feedback closed-loop control law. The used learning algorithm is a modified version of the backpropagation one. As an illustration, a numerical nonlinear discrete time example is considered. Simulation results show the effectiveness of the proposed method.

[1]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[2]  Richard S. Sutton,et al.  Neural networks for control , 1990 .

[3]  Andrew W. Moore,et al.  Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[4]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[5]  Bernard Widrow,et al.  Neural dynamic optimization for control systems. I. Background , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[6]  Bernard Widrow,et al.  Neural dynamic optimization for control systems.II. Theory , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[7]  A. Sideris,et al.  A multilayered neural network controller , 1988, IEEE Control Systems Magazine.

[8]  Zheng Chen,et al.  Neural Network -based Nearly Optimal Hamilton-Jacobi-Bellman Solution for Affine Nonlinear Discrete-Time Systems , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[9]  David M. Skapura,et al.  Neural networks - algorithms, applications, and programming techniques , 1991, Computation and neural systems series.

[10]  Nikita A. Visnevski Control of a nonlinear multivariable system with adaptive critic designs , 1997 .

[11]  V. Tikhomirov On the Representation of Continuous Functions of Several Variables as Superpositions of Continuous Functions of one Variable and Addition , 1991 .

[12]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[13]  S. F. R. F. Stengel 3 Model-Based Adaptive Critic Designs , 2004 .

[14]  F. Fairman Introduction to dynamic systems: Theory, models and applications , 1979, Proceedings of the IEEE.

[15]  S. Lyshevski Optimal control of nonlinear continuous-time systems: design of bounded controllers via generalized nonquadratic functionals , 1998, Proceedings of the 1998 American Control Conference. ACC (IEEE Cat. No.98CH36207).

[16]  Robert E. Kalaba,et al.  Dynamic Programming and Modern Control Theory , 1966 .

[17]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[18]  Frank L. Lewis,et al.  Fixed-Final Time Constrained Optimal Control of Nonlinear Systems Using Neural Network HJB Approach , 2006, CDC.

[19]  Zvi Shiller,et al.  Optimal obstacle avoidance based on the Hamilton-Jacobi-Bellman equation , 1994, IEEE Trans. Robotics Autom..

[20]  Hecht-Nielsen Theory of the backpropagation neural network , 1989 .

[21]  F.L. Lewis,et al.  Nearly optimal HJB solution for constrained input systems using a neural network least-squares approach , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[22]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[23]  George N. Saridis,et al.  An Approximation Theory of Optimal Control for Trainable Manipulators , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[24]  Rolf Unbehauen,et al.  A nonlinear optimal feedback controller using neural networks , 1996, Proceedings of 35th IEEE Conference on Decision and Control.

[25]  Victor M. Becerra,et al.  Optimal control , 2008, Scholarpedia.

[26]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[27]  F. Lewis,et al.  A Hamilton-Jacobi setup for constrained neural network control , 2003, Proceedings of the 2003 IEEE International Symposium on Intelligent Control.

[28]  Stuart E. Dreyfus,et al.  Applied Dynamic Programming , 1965 .

[29]  Y.H. Kim,et al.  Hamilton-Jacobi-Bellman optimal design of functional link neural network controller for robot manipulators , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[30]  Kumpati S. Narendra,et al.  Identification and control of dynamical systems using neural networks , 1990, IEEE Trans. Neural Networks.

[31]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[32]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[33]  G. Saridis,et al.  Approximate Solutions to the Time-Invariant Hamilton–Jacobi–Bellman Equation , 1998 .

[34]  Peter J. Gawthrop,et al.  Neural networks for control systems - A survey , 1992, Autom..

[35]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[36]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[37]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[38]  Rafael Castro-Linares,et al.  Trajectory tracking for non-holonomic cars: A linear approach to controlled leader-follower formation , 2010, 49th IEEE Conference on Decision and Control (CDC).