An analysis of reinforcement learning with function approximation
暂无分享,去创建一个
[1] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[2] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[3] S. Meyn,et al. Computable Bounds for Geometric Convergence Rates of Markov Chains , 1994 .
[4] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[5] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[6] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[7] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[8] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[9] P. Diaconis,et al. LOGARITHMIC SOBOLEV INEQUALITIES FOR FINITE MARKOV CHAINS , 1996 .
[10] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[11] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[12] V. Borkar. Stochastic approximation with two time scales , 1997 .
[13] Richard S. Sutton,et al. Open Theoretical Questions in Reinforcement Learning , 1999, EuroCOLT.
[14] V. Borkar. A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL , 2000, Probability in the Engineering and Informational Sciences.
[15] Benjamin Van Roy,et al. On the existence of fixed points for approximate value iteration and temporal-difference learning , 2000 .
[16] Michael I. Jordan,et al. On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001 .
[17] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[18] J. Rosenthal. QUANTITATIVE CONVERGENCE RATES OF MARKOV CHAINS: A SIMPLE ACCOUNT , 2002 .
[19] Doina Precup,et al. A Convergent Form of Approximate Policy Iteration , 2002, NIPS.
[20] Theodore J. Perkins,et al. On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains , 2002, ICML.
[21] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[22] Vladislav Tadic,et al. On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001, Machine Learning.
[23] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[24] William D. Smart,et al. Interpolation-based Q-learning , 2004, ICML.
[25] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.