暂无分享,去创建一个
Mohammad Norouzi | George Tucker | Ofir Nachum | Cosmin Paduraru | Tom Le Paine | Ziyu Wang | Michael R. Zhang | T. Paine | Ziyun Wang | G. Tucker | Ofir Nachum | Mohammad Norouzi | Cosmin Paduraru
[1] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[2] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Deep Reinforcement Learning , 2020, International Conference on Machine Learning.
[3] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[4] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[5] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.
[6] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[7] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[8] Bo Dai,et al. Off-Policy Evaluation via the Regularized Lagrangian , 2020, NeurIPS.
[9] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[10] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[11] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[12] Nando de Freitas,et al. Hyperparameter Selection for Offline Reinforcement Learning , 2020, ArXiv.
[13] Nando de Freitas,et al. Critic Regularized Regression , 2020, NeurIPS.
[14] Sergey Levine,et al. Benchmarks for Deep Off-Policy Evaluation , 2021, ICLR.
[15] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[16] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[17] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[18] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[19] Sergey Levine,et al. Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.
[20] Claude Sammut,et al. A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.
[21] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.
[22] Roberto Calandra,et al. Objective Mismatch in Model-based Reinforcement Learning , 2020, L4DC.
[23] Yutaka Matsuo,et al. Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization , 2020, ICLR.
[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[25] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[26] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[27] Peter Stone,et al. Importance Sampling Policy Evaluation with an Estimated Behavior Policy , 2018, ICML.
[28] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[29] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[30] Bo Dai,et al. Batch Stationary Distribution Estimation , 2020, ICML.
[31] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[32] Hoang Minh Le,et al. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning , 2019, NeurIPS Datasets and Benchmarks.
[33] Ilya Kostrikov,et al. Statistical Bootstrapping for Uncertainty Estimation in Off-Policy Evaluation , 2020, ArXiv.
[34] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[35] Evangelos Theodorou,et al. Model Predictive Path Integral Control using Covariance Variable Importance Sampling , 2015, ArXiv.
[36] Lihong Li,et al. On Minimax Optimal Offline Policy Evaluation , 2014, ArXiv.
[37] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[38] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[39] Andrew W. Moore,et al. Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping , 1992, NIPS.
[40] Peter Stone,et al. Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation , 2016, AAAI.
[41] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[42] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[43] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[44] Pieter Abbeel,et al. Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.
[45] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[46] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[47] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[48] Andrew W. Moore,et al. Locally Weighted Learning , 1997, Artificial Intelligence Review.
[49] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[50] Shie Mannor,et al. Off-policy Model-based Learning under Unknown Factored Dynamics , 2015, ICML.