暂无分享,去创建一个
Shimon Whiteson | Shangtong Zhang | Vivek Veeriah | Shangtong Zhang | Vivek Veeriah | Shimon Whiteson
[1] R. Sutton. The Grand Challenge of Predictive Empirical Abstract Knowledge , 2009 .
[2] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[3] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[4] Martha White,et al. An Off-policy Policy Gradient Theorem Using Emphatic Weightings , 2018, NeurIPS.
[5] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[6] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[7] Sergey Levine,et al. Recall Traces: Backtracking Models for Efficient Reinforcement Learning , 2018, ICLR.
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] VARUN CHANDOLA,et al. Anomaly detection: A survey , 2009, CSUR.
[10] Richard L. Lewis,et al. Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.
[11] Tao Wang,et al. Stable Dual Dynamic Programming , 2007, NIPS.
[12] Martha White,et al. Unifying Task Specification in Reinforcement Learning , 2016, ICML.
[13] Vipin Kumar,et al. Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.
[14] Erin J. Talvitie,et al. Hallucinating Value: A Pitfall of Dyna-style Planning with Imperfect Environment Models , 2020, ArXiv.
[15] V. Climenhaga. Markov chains and mixing times , 2013 .
[16] Joelle Pineau,et al. Constrained Markov Decision Processes via Backward Value Functions , 2020, ICML.
[17] Shimon Whiteson,et al. GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values , 2020, ICML.
[18] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[19] Junichiro Yoshimoto,et al. Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning , 2010, Neural Computation.
[20] Tao Wang,et al. Dual Representations for Dynamic Programming and Reinforcement Learning , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[21] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[22] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[23] Pieter Abbeel,et al. CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.
[24] Richard S. Sutton,et al. On Generalized Bellman Equations and Temporal-Difference Learning , 2017, Canadian Conference on AI.
[25] Bo Dai,et al. GenDICE: Generalized Offline Estimation of Stationary Values , 2020, ICLR.
[26] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[27] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[28] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[29] Yee Whye Teh,et al. An Analysis of Categorical Distributional Reinforcement Learning , 2018, AISTATS.
[30] Sanjay Chawla,et al. Deep Learning for Anomaly Detection: A Survey , 2019, ArXiv.
[31] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[32] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[33] Huizhen Yu,et al. On Convergence of Emphatic Temporal-Difference Learning , 2015, COLT.
[34] Nicolas Le Roux,et al. A Geometric Perspective on Optimal Representations for Reinforcement Learning , 2019, NeurIPS.
[35] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[36] Shimon Whiteson,et al. Generalized Off-Policy Actor-Critic , 2019, NeurIPS.
[37] Marc G. Bellemare,et al. Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift , 2019, AAAI.
[38] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[39] Dale Schuurmans,et al. Reinforcement Ranking , 2013, ArXiv.
[40] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[41] Jonathan Scholz,et al. Generative predecessor models for sample-efficient imitation learning , 2019, ICLR.
[42] Doina Precup,et al. Forethought and Hindsight in Credit Assignment , 2020, NeurIPS.
[43] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[44] Marc G. Bellemare,et al. The Value-Improvement Path: Towards Better Representations for Reinforcement Learning , 2020, ArXiv.
[45] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[46] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[47] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Matteo Hessel,et al. When to use parametric models in reinforcement learning? , 2019, NeurIPS.
[49] Shie Mannor,et al. Consistent On-Line Off-Policy Evaluation , 2017, ICML.
[50] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[51] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[52] Shimon Whiteson,et al. Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation , 2019, ICML.
[53] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[54] H. Robbins. A Stochastic Approximation Method , 1951 .
[55] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .