Data-driven partially observable dynamic processes using adaptive dynamic programming

Adaptive dynamic programming (ADP) has been widely recognized as one of the “core methodologies” to achieve optimal control for intelligent systems in Markov decision process (MDP). Generally, ADP control design requires all the information of the system dynamics. However, in many practical situations, the measured input and output data can only represent part of the system states. This means the complete information of the system cannot be available in many real-world cases, which narrows the range of application of the ADP design. In this paper, we propose a data-driven ADP method to stabilize the system with partially observable dynamics based on neural network techniques. A state network is integrated into the typical actor-critic architecture to provide an estimated state from the measured input/output sequences. The theoretical analysis and the stability discussion of this data-driven ADP method are also provided. Two examples are studied to verify our proposed method.

[1]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[2]  Paul J. Werbos,et al.  2009 Special Issue: Intelligence in the brain: A theory of how it works and how to build it , 2009 .

[3]  Haibo He,et al.  A neural network based online learning and control approach for Markov jump systems , 2015, Neurocomputing.

[4]  Feng Liu,et al.  A boundedness result for the direct heuristic dynamic programming , 2012, Neural Networks.

[5]  Haibo He,et al.  Optimal Control for Unknown Discrete-Time Nonlinear Markov Jump Systems Using Adaptive Dynamic Programming , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Haibo He,et al.  Reinforcement learning control based on multi-goal representation using hierarchical heuristic dynamic programming , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[7]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[8]  P.J. Werbos,et al.  Using ADP to Understand and Replicate Brain Intelligence: the Next Level Design , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[9]  Haibo He,et al.  Power System Stability Control for a Wind Farm Based on Adaptive Dynamic Programming , 2015, IEEE Transactions on Smart Grid.

[10]  D. Bertsekas,et al.  Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[11]  Hao Zhang,et al.  Partially Observable Markov Decision Processes: A Geometric Technique and Analysis , 2010, Oper. Res..

[12]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[13]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[14]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[15]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[16]  Haibo He,et al.  A three-network architecture for on-line learning and optimization based on adaptive dynamic programming , 2012, Neurocomputing.

[17]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  Jennie Si,et al.  Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[19]  Haibo He Self-Adaptive Systems for Machine Intelligence: He/Machine Intelligence , 2011 .

[20]  Paul J. Werbos,et al.  Foreword: ADP - The Key Direction for Future Research in Intelligent Control and Understanding Brain Intelligence , 2008, IEEE Trans. Syst. Man Cybern. Part B.

[21]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[22]  Milos Hauskrecht,et al.  Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes , 1997, AAAI/IAAI.

[23]  Frank L. Lewis,et al.  Learning and Optimization in Hierarchical Adaptive Critic Design , 2013 .

[24]  Haibo He Self-Adaptive Systems for Machine Intelligence , 2011 .

[25]  Alvin W Drake,et al.  Observation of a Markov process through a noisy channel , 1962 .

[26]  Jinyu Wen,et al.  Adaptive Learning in Tracking Control Based on the Dual Critic Network Design , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Haibo He,et al.  Real-time tracking on adaptive critic design with uniformly ultimately bounded condition , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[28]  Jinyu Wen,et al.  Energy-Storage-Based Low-Frequency Oscillation Damping Control Using Particle Swarm Optimization and Heuristic Dynamic Programming , 2014, IEEE Transactions on Power Systems.

[29]  Emad Saad,et al.  Reinforcement Learning in Partially Observable Markov Decision Processes using Hybrid Probabilistic Logic Programs , 2010, ArXiv.

[30]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.