暂无分享,去创建一个
Martha White | Richard S. Sutton | Andrew Patterson | Adam White | Sina Ghiassian | R. Sutton | Adam White | Martha White | Sina Ghiassian | Andrew Patterson
[1] Martha White,et al. Unifying Task Specification in Reinforcement Learning , 2016, ICML.
[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[3] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[4] Bo Liu,et al. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces , 2014, ArXiv.
[5] Pascal Vincent,et al. Convergent Tree-Backup and Retrace with Function Approximation , 2017, ICML.
[6] Richard S. Sutton,et al. Multi-step Off-policy Learning Without Importance Sampling Ratios , 2017, ArXiv.
[7] J. Zico Kolter,et al. The Fixed Points of Off-Policy TD , 2011, NIPS.
[8] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[9] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[10] Richard S. Sutton,et al. TD(λ) networks: temporal-difference networks with eligibility traces , 2005, ICML.
[11] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[12] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[13] A. Eigen-analysis. Stochastic Variance Reduction Methods for Policy Evaluation , 2017 .
[14] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[15] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[16] Leah M Hackman,et al. Faster Gradient-TD Algorithms , 2013 .
[17] R. Sutton,et al. A new Q ( � ) with interim forward view and Monte Carlo equivalence , 2014 .
[18] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[19] Shie Mannor,et al. Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis , 2015, AAAI.
[20] Doina Precup,et al. A new Q(lambda) with interim forward view and Monte Carlo equivalence , 2014, ICML.
[21] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[22] Huizhen Yu,et al. Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize , 2015, J. Mach. Learn. Res..
[23] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[24] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[25] Huizhen Yu,et al. On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning , 2017, ArXiv.
[26] Adam M White,et al. DEVELOPING A PREDICTIVE APPROACH TO KNOWLEDGE , 2015 .
[27] Richard S. Sutton,et al. Off-policy TD( l) with a true online equivalence , 2014, UAI.
[28] Huizhen Yu,et al. On Convergence of Emphatic Temporal-Difference Learning , 2015, COLT.
[29] Toby Walsh,et al. The Scaling of Search Cost , 1997, AAAI/IAAI.
[30] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[31] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[32] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[33] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[34] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[35] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[36] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[37] A. Juditsky,et al. Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.
[38] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[39] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[40] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[41] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[42] Marek Petrik,et al. Proximal Gradient Temporal Difference Learning Algorithms , 2016, IJCAI.
[43] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[44] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.