暂无分享,去创建一个
[1] R. Sutton,et al. A new Q ( � ) with interim forward view and Monte Carlo equivalence , 2014 .
[2] Huizhen Yu,et al. On Convergence of Emphatic Temporal-Difference Learning , 2015, COLT.
[3] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[4] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[5] Shie Mannor,et al. Adaptive Lambda Least-Squares Temporal Difference Learning , 2016, 1612.09465.
[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[7] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .
[8] Huizhen Yu,et al. Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..
[9] Martha White,et al. Investigating Practical Linear Temporal Difference Learning , 2016, AAMAS.
[10] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[11] Jun S. Liu,et al. Monte Carlo strategies in scientific computing , 2001 .
[12] Huizhen Yu,et al. Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize , 2015, J. Mach. Learn. Res..
[13] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[14] Richard S. Sutton,et al. Scaling life-long off-policy learning , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).
[15] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[16] Marc G. Bellemare,et al. Q(λ) with Off-Policy Corrections , 2016, ALT.
[17] Doina Precup,et al. A new Q(lambda) with interim forward view and Monte Carlo equivalence , 2014, ICML.
[18] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[19] Richard S. Sutton,et al. Off-policy TD( l) with a true online equivalence , 2014, UAI.
[20] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[21] Philip S. Thomas,et al. Safe Reinforcement Learning , 2015 .
[22] Richard S. Sutton,et al. True online TD(λ) , 2014, ICML 2014.
[23] Patrick M. Pilarski,et al. True Online Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[24] Adam M White,et al. DEVELOPING A PREDICTIVE APPROACH TO KNOWLEDGE , 2015 .
[25] Martha White,et al. Emphatic Temporal-Difference Learning , 2015, ArXiv.
[26] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[27] Hado Philip van Hasselt,et al. Insights in reinforcement rearning : formal analysis and empirical evaluation of temporal-difference learning algorithms , 2011 .
[28] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[29] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[30] J. Hammersley. SIMULATION AND THE MONTE CARLO METHOD , 1982 .
[31] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[32] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[33] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[34] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[35] J. Hammersley,et al. Monte Carlo Methods , 1966 .
[36] Nan Jiang,et al. Doubly Robust Off-policy Evaluation for Reinforcement Learning , 2015, ArXiv.
[37] Marc G. Bellemare,et al. Q($\lambda$) with Off-Policy Corrections , 2016 .
[38] Shie Mannor,et al. Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis , 2015, AAAI.
[39] Hado van Hasselt,et al. Insights in reinforcement ;learning: Formal analysis end empirical evaluation of temporal-difference learning , 2010 .
[40] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[41] Richard S. Sutton,et al. Off-policy learning based on weighted importance sampling with linear computational complexity , 2015, UAI.
[42] Martha White,et al. A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning , 2016, AAMAS.
[43] Shie Mannor,et al. Off-policy Model-based Learning under Unknown Factored Dynamics , 2015, ICML.
[44] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..