A Survey of Approximate Dynamic Programming

Multi-stage decision problems under uncertainty are abundant in process industries. Markov decision process (MDP) is a general mathematical formulation of such problems. Whereas stochastic programming and dynamic programming are the standard methods to solve MDPs, their unwieldy computational requirements limit their usefulness in real applications. Approximate dynamic programming (ADP) combines simulation and function approximation to alleviate the "curse-of-dimensionality" associated with the traditional dynamic programming approach. In this paper, the method of ADP, which abates the curse-of-dimensionality by solving the DP within a carefully chosen, small subset of the state space, was introduced; a survey of recent research directions within the field of ADP had been made.

[1]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[2]  Robert Kozma,et al.  Cellular SRN Trained by Extended Kalman Filter Shows Promise for ADP , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[3]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[4]  Mohamed S. Kamel,et al.  Reinforcement learning and aggregation , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[5]  Peter Dayan,et al.  The convergence of TD(λ) for general λ , 1992, Machine Learning.

[6]  Derong Liu,et al.  Adaptive critic learning techniques for automotive engine control , 2004, Proceedings of the 2004 American Control Conference.

[7]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[8]  Jay H. Lee,et al.  Choice of approximator and design of penalty function for an approximate dynamic programming based control approach , 2006 .

[9]  Yuan Ji Performance Potential-based Neuro-dynamic Programming for SMDPs , 2005 .

[10]  Jennie Si,et al.  ADP: Goals, Opportunities and Principles , 2004 .

[11]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC , 2005, Eur. J. Control.

[12]  Kotaro Hirasawa,et al.  Comparison between Genetic Network Programming (GNP) and Genetic Programming (GP) , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[13]  H. He,et al.  Efficient Reinforcement Learning Using Recursive Least-Squares Methods , 2011, J. Artif. Intell. Res..

[14]  TANGHao,et al.  Performance Potential-based Neuro-dynamic Programming for SMDPs , 2005 .

[15]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[16]  A.S. Willsky,et al.  An approximate dynamic programming approach to a communication constrained sensor management problem , 2005, 2005 7th International Conference on Information Fusion.

[17]  Xi-Ren Cao Learning and Optimization: From a System Theoretic Perspective , 2004 .

[18]  Mohamed S. Kamel,et al.  Aggregation of Multiple Reinforcement Learning Algorithms , 2006, Int. J. Artif. Intell. Tools.

[19]  Derong Liu,et al.  Neural network modeling and adaptive critic control of automotive fuel-injection systems , 2004, Proceedings of the 2004 IEEE International Symposium on Intelligent Control, 2004..

[20]  Mohamed S. Kamel,et al.  Aggregation of Reinforcement Learning Algorithms , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[21]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[22]  H. Lakshmanan,et al.  Decentralized approximate dynamic programming for dynamic networks of agents , 2006, 2006 American Control Conference.

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Jay H. Lee,et al.  Approximate dynamic programming based approach to process control and scheduling , 2006, Comput. Chem. Eng..

[25]  Roberto A. Santiago,et al.  Accelerating critic learning in approximate dynamic programming via value templates and perceptual learning , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[26]  Jay H. Lee,et al.  Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes , 2005, Autom..

[27]  Unified NDP method based on TD(0) learning for both average and discounted Markov decision processes , 2006 .

[28]  TangHao,et al.  A Simulation Optimization Algorithm for CTMDPs Based on Randomized Stationary Policies^1) , 2004 .

[29]  Shingo Mabu,et al.  Genetic Network Programming with Reinforcement Learning and Its Performance Evaluation , 2004, GECCO.

[30]  Derong Liu,et al.  Automotive Engine Torque and Air-Fuel Ratio Control Using Dual Heuristic Dynamic Programming , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[31]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[32]  Bo Lincoln,et al.  Relaxing dynamic programming , 2006, IEEE Transactions on Automatic Control.

[33]  A. Rantzer On Approximate Dynamic Programming in Switching Systems , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[34]  Paul J. Werbos,et al.  Stable adaptive control using new critic designs , 1998, Other Conferences.

[35]  H.R. Tizhoosh,et al.  Opposition-Based Q(λ) Algorithm , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[36]  Shingo Mabu,et al.  An Extension of Genetic Network Programming with Reinforcement Learning Using Actor-Critic , 2006, 2006 IEEE International Conference on Evolutionary Computation.