A data-driven online ADP of exponential convergence based on k-nearest-neighbor Averager, stable term and persistence excitation

With the development of marine science, aeronautics and astronautics, energy, chemical industry, biomedicine and management science, many complex systems face the problem of optimization and control. Approximate dynamic programming solves the curse of dimensionality problem of dynamic programming, and it is a new kind of approximate optimization solution that emerges in recent years. Based on the analysis of optimization system, this paper proposes a nonlinear multi-input multi-output, online learning, and data-driven approximate dynamic programming structure and its learning algorithm. The method is achieved from the following three aspects: 1) the critic function of multi-dimensional input critic module of the approximate dynamic programming is approximated with a data-driven k-nearest neighbor method; 2) the multi-output policy iteration of the approximate dynamic programming actor module is calculated with an exponential convergence performance; 3) The critic and actor modules are learned synchronously, and achieve the online optimal and control effect. The optimal control for the longitudinal motion of a thermal underwater glider is used to show the effect of the proposed method. This work can lay a foundation for the theory and application of a nonlinear data-driven multi-input multi-output approximate dynamic programming method. It's also the consensus needs in optimization control and artificial intelligence of many scientific and engineering fields, such as energy conservation, emission reduction, decision support and operational management etc.

[1]  Derong Liu,et al.  Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems , 2016, IEEE Transactions on Cybernetics.

[2]  Frank L. Lewis,et al.  Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[3]  Frank L. Lewis,et al.  Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis , 2017, IEEE Transactions on Cybernetics.

[4]  Warren E. Dixon,et al.  Nonlinear Control of Engineering Systems: A Lyapunov-Based Approach , 2003 .

[5]  Joshua Grady Graver,et al.  UNDERWATER GLIDERS: DYNAMICS, CONTROL AND DESIGN , 2005 .

[6]  Naomi Ehrich Leonard,et al.  Model-based feedback control of autonomous underwater gliders , 2001 .

[7]  Jay H. Lee,et al.  Choice of approximator and design of penalty function for an approximate dynamic programming based control approach , 2006 .

[8]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[9]  Derong Liu,et al.  Discrete-Time Optimal Control via Local Policy Iteration Adaptive Dynamic Programming , 2017, IEEE Transactions on Cybernetics.

[10]  Frank L. Lewis,et al.  Error-Tolerant Iterative Adaptive Dynamic Programming for Optimal Renewable Home Energy Scheduling and Battery Management , 2017, IEEE Transactions on Industrial Electronics.

[11]  Radhakant Padhi,et al.  A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems , 2006, Neural Networks.

[12]  Dimitri P. Bertsekas,et al.  Missile defense and interceptor allocation by neuro-dynamic programming , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[13]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[14]  Frank L. Lewis,et al.  Mixed Iterative Adaptive Dynamic Programming for Optimal Battery Energy Control in Smart Residential Microgrids , 2017, IEEE Transactions on Industrial Electronics.

[15]  Chuan-Kai Lin,et al.  Adaptive critic autopilot design of Bank-to-turn missiles using fuzzy basis function networks , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Ali Heydari,et al.  Fixed-final-time optimal tracking control of input-affine nonlinear systems , 2014, Neurocomputing.

[17]  Qinglai Wei,et al.  Adaptive Dynamic Programming-Based Optimal Control Scheme for Energy Storage Systems With Solar Renewable Energy , 2017, IEEE Transactions on Industrial Electronics.

[18]  Yihua Liu,et al.  A data‐driven approximate solution to the model‐free HJB equation , 2018 .

[19]  Derong Liu,et al.  ADHDP for the pH Value Control in the Clarifying Process of Sugar Cane Juice , 2008, ISNN.

[20]  Chao Lu,et al.  Direct Heuristic Dynamic Programming for Damping Oscillations in a Large Power System , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21]  Frank L. Lewis,et al.  A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems , 2013, Autom..

[22]  D. Mayne Nonlinear and Adaptive Control Design [Book Review] , 1996, IEEE Transactions on Automatic Control.

[23]  D. Ernst,et al.  Power systems stability control: reinforcement learning framework , 2004, IEEE Transactions on Power Systems.

[24]  Derong Liu,et al.  Adaptive Critic Learning Techniques for Engine Torque and Air–Fuel Ratio Control , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[25]  Jay H. Lee,et al.  Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes , 2005, Autom..

[26]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[27]  Ali Heydari,et al.  Optimal switching between controlled subsystems with free mode sequence , 2015, Neurocomputing.

[28]  Zhijian Huang,et al.  Three bounded proofs for nonlinear multi‐input multi‐output approximate dynamic programming based on the Lyapunov stability theory , 2018 .

[29]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[30]  Jennie Si,et al.  Helicopter trimming and tracking control using direct neural dynamic programming , 2003, IEEE Trans. Neural Networks.

[31]  Naomi Ehrich Leonard,et al.  Stabilization and coordination of underwater gliders , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[32]  Xin Xu,et al.  Reinforcement learning algorithms with function approximation: Recent advances and applications , 2014, Inf. Sci..

[33]  Nilesh V. Kulkarni,et al.  Intelligent engine control using an adaptive critic , 2003, IEEE Trans. Control. Syst. Technol..