Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems using Online Approximators

In this paper, reinforcement learning state- and output-feedback-based adaptive critic controller designs are proposed by using the online approximators (OLAs) for a general multi-input and multioutput affine unknown nonlinear discretetime systems in the presence of bounded disturbances. The proposed controller design has two entities, an action network that is designed to produce optimal signal and a critic network that evaluates the performance of the action network. The critic estimates the cost-to-go function which is tuned online using recursive equations derived from heuristic dynamic programming. Here, neural networks (NNs) are used both for the action and critic whereas any OLAs, such as radial basis functions, splines, fuzzy logic, etc., can be utilized. For the output-feedback counterpart, an additional NN is designated as the observer to estimate the unavailable system states, and thus, separation principle is not required. The NN weight tuning laws for the controller schemes are also derived while ensuring uniform ultimate boundedness of the closed-loop system using Lyapunov theory. Finally, the effectiveness of the two controllers is tested in simulation on a pendulum balancing system and a two-link robotic arm system.

[1]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[2]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[3]  Jennie Si,et al.  Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[4]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[5]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[6]  Rein Luus,et al.  Iterative dynamic programming , 2019, Iterative Dynamic Programming.

[7]  Jagannathan Sarangapani,et al.  Neural Network Control of Nonlinear Discrete-Time Systems , 2018 .

[8]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[12]  Donald E. Kirk,et al.  Optimal control theory : an introduction , 1970 .

[13]  P. B. Coaker,et al.  Applied Dynamic Programming , 1964 .

[14]  Frank L. Lewis,et al.  Neural Network Control Of Robot Manipulators And Non-Linear Systems , 1998 .

[15]  Sarangapani Jagannathan,et al.  Online optimal control of nonlinear discrete-time systems using approximate dynamic programming , 2011 .

[16]  Sarangapani Jagannathan Discrete-time CMAC NN control of feedback linearizable nonlinear systems under a persistence of excitation , 1999, IEEE Trans. Neural Networks.

[17]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[18]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[19]  Ruey-Wen Liu,et al.  Construction of Suboptimal Control Sequences , 1967 .

[20]  Gary Boone,et al.  Efficient reinforcement learning: model-based Acrobot control , 1997, Proceedings of International Conference on Robotics and Automation.

[21]  Lei Yang,et al.  Direct Heuristic Dynamic Programming for Nonlinear Tracking Control With Filtered Tracking Error , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[23]  Yoh-Han Pao,et al.  Stochastic choice of basis functions in adaptive function approximation and the functional-link net , 1995, IEEE Trans. Neural Networks.

[24]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[25]  S. Jagannathan Discrete-time CMAC NN control of feedback linearizable nonlinear systems under a persistence of excitation , 1996, Proceedings of the 1996 IEEE International Symposium on Intelligent Control.

[26]  Z. Rekasius,et al.  Suboptimal design of intentionally nonlinear controllers , 1964 .