暂无分享,去创建一个
Rajesh Ranganath | William F. Whitney | Joan Bruna | David Brandfonbrener | R. Ranganath | Joan Bruna | David Brandfonbrener
[1] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[2] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[3] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[4] R. Quentin Grafton,et al. truncated normal distribution , 2012 .
[5] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[6] Philip Amortila,et al. A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting , 2020, ArXiv.
[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[8] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[9] Joelle Pineau,et al. Benchmarking Batch Deep Reinforcement Learning Algorithms , 2019, ArXiv.
[10] S. Kakade,et al. Reinforcement Learning: Theory and Algorithms , 2019 .
[11] Romain Laroche,et al. Safe Policy Improvement with Baseline Bootstrapping , 2017, ICML.
[12] Nando de Freitas,et al. Hyperparameter Selection for Offline Reinforcement Learning , 2020, ArXiv.
[13] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[14] Razvan Pascanu,et al. Regularized Behavior Value Estimation , 2021, ArXiv.
[15] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[16] Marc G. Bellemare,et al. The Importance of Pessimism in Fixed-Dataset Policy Optimization , 2020, ArXiv.
[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[18] Ilya Kostrikov,et al. Offline Reinforcement Learning with Fisher Divergence Critic Regularization , 2021, ICML.
[19] Sergey Levine,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.
[20] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[21] Nando de Freitas,et al. Critic Regularized Regression , 2020, NeurIPS.
[22] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[23] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[24] Andrea Zanette,et al. Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL , 2020, ICML.
[25] Yu-Xiang Wang,et al. Imitation-Regularized Offline Learning , 2019, AISTATS.
[26] Scott Niekum,et al. You Only Evaluate Once: a Simple Baseline Algorithm for Offline RL , 2021, CoRL.
[27] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[28] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[29] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[30] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[31] Sergio Gomez Colmenarejo,et al. RL Unplugged: Benchmarks for Offline Reinforcement Learning , 2020, ArXiv.
[32] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.
[33] Ruosong Wang,et al. What are the Statistical Limits of Offline RL with Linear Function Approximation? , 2020, ICLR.
[34] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[35] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[36] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[37] Hoang Minh Le,et al. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning , 2019, NeurIPS Datasets and Benchmarks.
[38] Qing Wang,et al. Exponentially Weighted Imitation Learning for Batched Historical Data , 2018, NeurIPS.
[39] Mengdi Wang,et al. Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation , 2020, ICML.
[40] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.
[41] Ilya Kostrikov,et al. AlgaeDICE: Policy Gradient from Arbitrary Experience , 2019, ArXiv.
[42] Romain Laroche,et al. Safe Policy Improvement with Soft Baseline Bootstrapping , 2019, ECML/PKDD.
[43] Ruosong Wang,et al. Instabilities of Offline RL with Pre-Trained Neural Representation , 2021, ICML.