暂无分享,去创建一个
Zita Marinho | David Silver | Matteo Hessel | Gregory Farquhar | Angelos Filos | Kate Baumli | Hado van Hasselt | Matteo Hessel | Angelos Filos | Gregory Farquhar | Hado Philip van Hasselt | Kate Baumli | Zita Marinho | David Silver
[1] Erik Talvitie,et al. The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces , 2018, ArXiv.
[2] Mohammad Norouzi,et al. Mastering Atari with Discrete World Models , 2020, ICLR.
[3] Ondrej Bojar,et al. Improving Translation Model by Monolingual Data , 2011, WMT@EMNLP.
[4] Wenlong Fu,et al. Model-based reinforcement learning: A survey , 2018 .
[5] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[6] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[7] K. I. M. McKinnon,et al. On the Generation of Markov Decision Processes , 1995 .
[8] Sergey Levine,et al. Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[9] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[10] Sergey Levine,et al. Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model , 2019, NeurIPS.
[11] Leslie Pack Kaelbling,et al. Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.
[12] Neil Genzlinger. A. and Q , 2006 .
[13] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[14] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[15] Matteo Hessel,et al. Podracer architectures for scalable Reinforcement Learning , 2021, ArXiv.
[16] Satinder Singh,et al. The Value Equivalence Principle for Model-Based Reinforcement Learning , 2020, NeurIPS.
[17] Jürgen Schmidhuber,et al. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.
[18] David Silver,et al. Muesli: Combining Improvements in Policy Optimization , 2021, ICML.
[19] Rémi Munos,et al. Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.
[20] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[21] Amir-massoud Farahmand,et al. Iterative Value-Aware Model Learning , 2018, NeurIPS.
[22] Jürgen Schmidhuber,et al. World Models , 2018, ArXiv.
[23] Myle Ott,et al. Understanding Back-Translation at Scale , 2018, EMNLP.
[24] Petr Baudis,et al. PACHI: State of the Art Open Source Go Program , 2011, ACG.
[25] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[26] David Silver,et al. Online and Offline Reinforcement Learning by Planning with a Learned Model , 2021, NeurIPS.
[27] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[28] Tao Yu,et al. PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning , 2021, NeurIPS.
[29] Martin A. Riedmiller,et al. Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models , 2019, CoRL.
[30] R. Bellman. A Markovian Decision Process , 1957 .
[31] Sergey Levine,et al. PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning , 2021, ICML.
[32] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[33] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[34] Aaron van den Oord,et al. Shaping Belief States with Generative Environment Models for RL , 2019, NeurIPS.
[35] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[36] Mohammad Norouzi,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.
[37] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.
[38] P. Alam,et al. H , 1887, High Explosives, Propellants, Pyrotechnics.
[39] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[40] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[41] 拓海 杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .
[42] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[43] Jessica B. Hamrick,et al. On the role of planning in model-based deep reinforcement learning , 2020, ArXiv.
[44] Daniel Nikovski,et al. Value-Aware Loss Function for Model-based Reinforcement Learning , 2017, AISTATS.
[45] Amir-massoud Farahmand,et al. Frequency-based Search-control in Dyna , 2020, ICLR.
[46] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.
[47] Shimon Whiteson,et al. TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning , 2017, ICLR.
[48] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[49] Shimon Whiteson,et al. Deep Residual Reinforcement Learning , 2019, AAMAS.
[50] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.
[51] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[52] Doina Precup,et al. Value-driven Hindsight Modelling , 2020, NeurIPS.
[53] Doina Precup,et al. Forethought and Hindsight in Credit Assignment , 2020, NeurIPS.
[54] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[55] Matteo Hessel,et al. When to use parametric models in reinforcement learning? , 2019, NeurIPS.
[56] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[57] Fabio Viola,et al. Causally Correct Partial Models for Reinforcement Learning , 2020, ArXiv.