暂无分享,去创建一个
[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[2] Seyed Kamyar Seyed Ghasemipour,et al. EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL , 2020, ICML.
[3] K. Jarrod Millman,et al. Array programming with NumPy , 2020, Nat..
[4] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[5] Filipe Wall Mutz,et al. Hindsight policy gradients , 2017, ICLR.
[6] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[7] Michael Fairbank,et al. Reinforcement Learning by Value Gradients , 2008, ArXiv.
[8] Bram Bakker,et al. Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.
[9] Richard S. Sutton,et al. Sample-based learning and search with permanent and transient memories , 2008, ICML '08.
[10] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[11] S. Levine,et al. Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.
[12] Yu Bai,et al. Near-Optimal Offline Reinforcement Learning via Double Variance Reduction , 2021, ArXiv.
[13] Matthieu Geist,et al. Offline Reinforcement Learning with Pseudometric Learning , 2021, ICML.
[14] Gabriel Dulac-Arnold,et al. Model-Based Offline Planning , 2020, ArXiv.
[15] David Silver,et al. Memory-based control with recurrent neural networks , 2015, ArXiv.
[16] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[17] Razvan Pascanu,et al. Stabilizing Transformers for Reinforcement Learning , 2019, ICML.
[18] Mohammad Norouzi,et al. Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization , 2021, ICLR.
[19] Stefano Ermon,et al. Calibrated Model-Based Deep Reinforcement Learning , 2019, ICML.
[20] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[21] Sheila A. McIlraith,et al. Planning from Pixels using Inverse Dynamics Models , 2020, ICLR.
[22] Qi Liu,et al. Insertion-based Decoding with Automatically Inferred Generation Order , 2019, Transactions of the Association for Computational Linguistics.
[23] Jimmy Ba,et al. Exploring Model-based Planning with Policy Networks , 2019, ICLR.
[24] Ryan Cotterell,et al. If Beam Search Is the Answer, What Was the Question? , 2020, EMNLP.
[25] Sergey Levine,et al. Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.
[26] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[27] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[28] Sergey Levine,et al. Learning to Reach Goals via Iterated Supervised Learning , 2019, ICLR.
[29] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[30] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[31] Glen Berseth,et al. DeepLoco , 2017, ACM Trans. Graph..
[32] Sergey Levine,et al. Offline Reinforcement Learning with Implicit Q-Learning , 2021, ICLR.
[33] Filipe Wall Mutz,et al. Training Agents using Upside-Down Reinforcement Learning , 2019, ArXiv.
[34] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.
[35] Andrew Gordon Wilson,et al. On the model-based stochastic value gradient for continuous reinforcement learning , 2020, L4DC.
[36] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[38] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[39] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.
[40] Jerrod Parker,et al. Adaptive Transformers in RL , 2020, ArXiv.
[41] Martin A. Riedmiller,et al. Approximate model-assisted Neural Fitted Q-Iteration , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).
[42] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[43] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[44] Sergey Levine,et al. Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.
[45] S. Levine,et al. γ-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction , 2020, ArXiv.
[46] Liu Yang,et al. Long Range Arena: A Benchmark for Efficient Transformers , 2020, ICLR.
[47] Thorsten Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[48] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[49] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[50] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[51] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[52] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[53] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[54] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[55] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[56] Sergey Levine,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.
[57] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.
[58] Siddhartha Banerjee,et al. Adaptive Discretization for Episodic Reinforcement Learning in Metric Spaces , 2019, Proc. ACM Meas. Anal. Comput. Syst..
[59] Honglak Lee,et al. Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.
[60] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[61] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[62] Daan Wierstra,et al. Recurrent Environment Simulators , 2017, ICLR.
[63] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[64] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[65] Pieter Abbeel,et al. Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.
[66] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[67] Chelsea Finn,et al. Language as an Abstraction for Hierarchical Deep Reinforcement Learning , 2019, NeurIPS.
[68] Ruslan Salakhutdinov,et al. Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation , 2021, ICLR.
[69] Zhuoran Yang,et al. Is Pessimism Provably Efficient for Offline RL? , 2020, ICML.
[70] Juergen Schmidhuber,et al. Reinforcement Learning Upside Down: Don't Predict Rewards - Just Map Them to Actions , 2019, ArXiv.
[71] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[72] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.