暂无分享,去创建一个
Joelle Pineau | Doina Precup | Pierre Thodoroff | Lucas Caccia | Nishanth Anand | Doina Precup | Joelle Pineau | Pierre Thodoroff | Lucas Caccia | N. Anand
[1] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..
[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[3] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[4] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[5] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[6] Tom Schaul,et al. Natural Value Approximators: Learning when to Trust Past Estimates , 2017, NIPS.
[7] Yan Wu,et al. Optimizing agent behavior over long time scales by transporting value , 2018, Nature Communications.
[8] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[9] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[10] David Silver,et al. Learning values across many orders of magnitude , 2016, NIPS.
[11] Pieter Abbeel,et al. Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..
[12] P. Dayan,et al. States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.
[13] Joelle Pineau,et al. A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning , 2018, ArXiv.
[14] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[15] Tom Schaul,et al. StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.
[16] Daniel Polani,et al. Information Theory of Decisions and Actions , 2011 .
[17] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[18] P. Dayan. The Convergence of TD(λ) for General λ , 2004, Machine Learning.
[19] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[20] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[21] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[22] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[23] M. Puterman. Chapter 8 Markov decision processes , 1990 .
[24] Martha White,et al. Two-Timescale Networks for Nonlinear Value Function Approximation , 2019, ICLR.
[25] Martha White,et al. Emphatic Temporal-Difference Learning , 2015, ArXiv.
[26] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[27] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[28] W. Marsden. I and J , 2012 .
[29] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[30] Peter Henderson,et al. Reward Estimation for Variance Reduction in Deep Reinforcement Learning , 2018, CoRL.
[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[32] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[33] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[34] Joelle Pineau,et al. Temporal Regularization in Markov Decision Process , 2018, ArXiv.
[35] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[36] Devavrat Shah,et al. Q-learning with Nearest Neighbors , 2018, NeurIPS.
[37] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.