暂无分享,去创建一个
Christopher Grimm | David Silver | Gregory Farquhar | Satinder Singh | Andr'e Barreto | Satinder Singh | André Barreto | Gregory Farquhar | Christopher Grimm | David Silver
[1] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[2] Amir-massoud Farahmand,et al. Iterative Value-Aware Model Learning , 2018, NeurIPS.
[3] Doina Precup,et al. Bounding Performance Loss in Approximate MDP Homomorphisms , 2008, NIPS.
[4] Ambuj Tewari,et al. Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles , 2019, AISTATS.
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] Matteo Hessel,et al. Podracer architectures for scalable Reinforcement Learning , 2021, ArXiv.
[7] David Silver,et al. Online and Offline Reinforcement Learning by Planning with a Learned Model , 2021, NeurIPS.
[8] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[9] Rowan McAllister,et al. Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.
[10] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[11] Robin Milner,et al. Communication and concurrency , 1989, PHI Series in computer science.
[12] Joelle Pineau,et al. Learning Causal State Representations of Partially Observable Environments , 2019, ArXiv.
[13] Pablo Samuel Castro,et al. Scalable methods for computing state similarity in deterministic Markov Decision Processes , 2019, AAAI.
[14] Shimon Whiteson,et al. Deep Variational Reinforcement Learning for POMDPs , 2018, ICML.
[15] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[16] Alborz Geramifard,et al. Reinforcement learning with misspecified model classes , 2013, 2013 IEEE International Conference on Robotics and Automation.
[17] Craig Boutilier,et al. Value-Directed Belief State Approximation for POMDPs , 2000, UAI.
[18] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..
[19] Robert Givan,et al. Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.
[20] Joelle Pineau,et al. Combined Reinforcement Learning via Abstract Representations , 2018, AAAI.
[21] Wulfram Gerstner,et al. Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation , 2018, ICML.
[22] Lawson L. S. Wong,et al. Learning discrete state abstractions with deep variational inference , 2020, ArXiv.
[23] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[24] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.
[25] Marlos C. Machado,et al. Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning , 2021, ICLR.
[26] Satinder Singh,et al. The Value Equivalence Principle for Model-Based Reinforcement Learning , 2020, NeurIPS.
[27] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[28] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.
[29] N. Nilsson. STUART RUSSELL AND PETER NORVIG, ARTIFICIAL INTELLIGENCE: A MODERN APPROACH , 1996 .
[30] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[31] Marc G. Bellemare,et al. DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.
[32] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[33] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[34] Craig Boutilier,et al. Value-Directed Compression of POMDPs , 2002, NIPS.
[35] Shimon Whiteson,et al. TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning , 2017, ICLR.
[36] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[37] David Silver,et al. Muesli: Combining Improvements in Policy Optimization , 2021, ICML.
[38] Daniel Nikovski,et al. Value-Aware Loss Function for Model-based Reinforcement Learning , 2017, AISTATS.
[39] Balaraman Ravindran. Approximate Homomorphisms : A framework for non-exact minimization in Markov Decision Processes , 2022 .
[40] Frans A. Oliehoek,et al. Plannable Approximations to MDP Homomorphisms: Equivariance under Actions , 2020, AAMAS.
[41] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
[42] Sergey Levine,et al. SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning , 2018, ICML.
[43] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.