Adaptive bases for Q-learning
暂无分享,去创建一个
[1] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[2] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[3] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[4] Simon Haykin,et al. Neural Networks: A Comprehensive Foundation , 1998 .
[5] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[6] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[7] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control 3rd Edition, Volume II , 2010 .
[8] Francisco S. Melo,et al. Convergence of Q-learning with linear function approximation , 2007, 2007 European Control Conference (ECC).
[9] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[10] V. Borkar. Stochastic approximation with two time scales , 1997 .
[11] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..
[12] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[13] Dimitri P. Bertsekas,et al. Basis function adaptation methods for cost approximation in MDP , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[14] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[16] John Odentrantz,et al. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.
[17] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[18] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.