An Efficient Impulsive Adaptive Dynamic Programming Algorithm for Stochastic Systems

In this study, a novel general impulsive transition matrix is defined, which can reveal the transition dynamics and probability distribution evolution patterns for all system states between two impulsive “events,” instead of two regular time indexes. Based on this general matrix, the policy iteration-based impulsive adaptive dynamic programming (IADP) algorithm along with its variant, which is a more efficient IADP (EIADP) algorithm, are developed in order to solve the optimal impulsive control problems of discrete stochastic systems. Through analyzing the monotonicity, stability, and convergency properties of the obtained iterative value functions and control laws, it is proved that the IADP and EIADP algorithms both converge to the optimal impulsive performance index function. By dividing the whole impulsive policy into smaller pieces, the proposed EIADP algorithm updates the iterative policies in a “piece-by-piece” manner according to the actual hardware constraints. This feature of the EIADP method enables these ADP-based algorithms to be fully optimized to run on all “sizes” of computing devices including the ones with low memory spaces. A simulation experiment is conducted to validate the effectiveness of the present methods.

[1]  Junfei Qiao,et al.  Data-Driven Iterative Adaptive Critic Control Toward an Urban Wastewater Treatment Plant , 2021, IEEE Transactions on Industrial Electronics.

[2]  Zhuo Wang,et al.  Self-Learning Optimal Control for Ice-Storage Air Conditioning Systems via Data-Based Adaptive Dynamic Programming , 2021, IEEE Transactions on Industrial Electronics.

[3]  Xiong Yang,et al.  Adaptive Critic Designs for Optimal Event-Driven Control of a CSTR System , 2021, IEEE Transactions on Industrial Informatics.

[4]  Qinglai Wei,et al.  Adaptive Dynamic Programming for Control: A Survey and Recent Advances , 2021, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[5]  Dongbin Zhao,et al.  Online Minimax Q Network Learning for Two-Player Zero-Sum Markov Games , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Xiao Han,et al.  Online policy iteration ADP-based attitude-tracking control for hypersonic vehicles , 2020 .

[7]  Qinglai Wei,et al.  Discrete-Time Impulsive Adaptive Dynamic Programming , 2020, IEEE Transactions on Cybernetics.

[8]  Ali Heydari,et al.  Optimal Impulsive Control Using Adaptive Dynamic Programming and its Application in Spacecraft Rendezvous , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Jinde Cao,et al.  Impulsive Control of Nonlinear Systems With Time-Varying Delay and Applications , 2020, IEEE Transactions on Cybernetics.

[10]  Lukasz Stettner,et al.  Zero-Sum Markov Games with Impulse Controls , 2020, SIAM J. Control. Optim..

[11]  Yin Yang,et al.  Policy Iteration Q-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems , 2020, IEEE Transactions on Cybernetics.

[12]  Stephanie Gil,et al.  Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration With Application to Autonomous Sequential Repair Problems , 2020, IEEE Robotics and Automation Letters.

[13]  Frank L. Lewis,et al.  Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics , 2019, IEEE Transactions on Cybernetics.

[14]  Derong Liu,et al.  Discrete-Time Optimal Control via Local Policy Iteration Adaptive Dynamic Programming , 2017, IEEE Transactions on Cybernetics.

[15]  Yu Liu,et al.  Optimal constrained self-learning battery sequential management in microgrid via adaptive dynamic programming , 2017, IEEE/CAA Journal of Automatica Sinica.

[16]  Yang Xiong,et al.  Adaptive Dynamic Programming with Applications in Optimal Control , 2017 .

[17]  Simone D'Amico,et al.  New Closed-Form Solutions for Optimal Impulsive Control of Spacecraft Relative Motion , 2016 .

[18]  F. Dufour,et al.  Optimal impulsive control of piecewise deterministic Markov processes , 2016 .

[19]  Derong Liu,et al.  Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints , 2015, IEEE Transactions on Cybernetics.

[20]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[21]  F. Dufour,et al.  Impulsive Control for Continuous-Time Markov Decision Processes: A Linear Programming Approach , 2014, 1402.6106.

[22]  Derong Liu,et al.  Decentralized Stabilization for a Class of Continuous-Time Nonlinear Interconnected Systems Using Online Learning Optimal Control Approach , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[23]  S. Balakrishnan,et al.  Variable Time Impulse System Optimization with Continuous Control and Impulse Control , 2014 .

[24]  Florian Holzapfel,et al.  Optimal Control Framework for Impulsive Missile Interception Guidance , 2013 .

[25]  Derong Liu,et al.  Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[26]  Wassim M. Haddad,et al.  Non-linear impulsive dynamical systems. Part I: Stability and dissipativity , 2001 .

[27]  V. Lakshmikantham,et al.  Theory of Impulsive Differential Equations , 1989, Series in Modern Applied Mathematics.

[28]  Derek F Lawden,et al.  Optimal trajectories for space navigation , 1964 .

[29]  Dimitri Bertsekas,et al.  Multiagent Reinforcement Learning: Rollout and Policy Iteration , 2021, IEEE/CAA Journal of Automatica Sinica.

[30]  Alexander B. Miller,et al.  A numerical approach to joint continuous and impulsive control of Markov chains , 2018 .

[31]  Derong Liu,et al.  Residential energy scheduling for variable weather solar energy based on adaptive dynamic programming , 2018, IEEE/CAA Journal of Automatica Sinica.

[32]  Xiaohua Wang,et al.  Optimal neurocontroller synthesis for impulse-driven systems , 2010, Neural Networks.