Performance Loss Bounds for Approximate Value Iteration with State Aggregation
暂无分享,去创建一个
[1] Rutherford Aris,et al. Discrete Dynamic Programming , 1965, The Mathematical Gazette.
[2] B. Fox. Discretizing dynamic programs , 1973 .
[3] D. Bertsekas. Convergence of discretization procedures in dynamic programming , 1975 .
[4] Ward Whitt,et al. Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..
[5] Thomas L. Morin,et al. COMPUTATIONAL ADVANCES IN DYNAMIC PROGRAMMING , 1978 .
[6] K. Hinderer. ON APPROXIMATE SOLUTIONS OF FINITE-STAGE DYNAMIC PROGRAMS , 1978 .
[7] Roy Mendelssohn,et al. An Iterative Aggregation Procedure for Markov Decision Processes , 1982, Oper. Res..
[8] Sven Axsäter,et al. State aggregation in dynamic programming - An application to scheduling of independent jobs on parallel processors , 1983 .
[9] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[10] John R. Birge,et al. Aggregation bounds in stochastic linear programming , 1985, Math. Program..
[11] Robert L. Smith,et al. Aggregation in Dynamic Programming , 1987, Oper. Res..
[12] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .
[13] John N. Tsitsiklis,et al. The complexity of dynamic programming , 1989, J. Complex..
[14] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
[15] J. Tsitsiklis,et al. An optimal one-way multigrid algorithm for discrete-time stochastic control , 1991 .
[16] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[17] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[18] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[19] John Rust. Using Randomization to Break the Curse of Dimensionality , 1997 .
[20] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[21] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[22] Jérôme Barraquand,et al. Numerical Valuation of High Dimensional Multivariate American Securities , 1995, Journal of Financial and Quantitative Analysis.
[23] Dimitri P. Bertsekas,et al. A Counterexample to Temporal Differences Learning , 1995, Neural Computation.
[24] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .
[25] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[26] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[27] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[28] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[29] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[30] Benjamin Van Roy. Learning and value function approximation in complex decision processes , 1998 .
[31] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[32] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[33] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[34] H. Kushner. Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .
[35] Benjamin Van Roy,et al. On the existence of fixed points for approximate value iteration and temporal-difference learning , 2000 .
[36] Benjamin Van Roy,et al. Approximate Dynamic Programming via Linear Programming , 2001, NIPS.
[37] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[38] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[39] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[40] John N. Tsitsiklis,et al. On Average Versus Discounted Reward Temporal-Difference Learning , 2002, Machine Learning.
[41] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[42] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[43] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[44] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.
[45] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[46] Aggregation in Stochastic Dynamic Programming , 2004 .
[47] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[48] A. Barto,et al. Improved Temporal Difference Methods with Linear Function Approximation , 2004 .
[49] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[50] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[51] Warren B. Powell,et al. Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.
[52] David Choi,et al. A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..
[53] R. Sutton. On The Virtues of Linear Learning and Trajectory Distributions , 2007 .
[54] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .