暂无分享,去创建一个
Satinder Singh | David Silver | Andr'e Barreto | Christopher Grimm | André Barreto | Satinder Singh | Christopher Grimm | David Silver
[1] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[2] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[3] Craig Boutilier,et al. Value-Directed Compression of POMDPs , 2002, NIPS.
[4] Pablo Samuel Castro,et al. Scalable methods for computing state similarity in deterministic Markov Decision Processes , 2019, AAAI.
[5] Shimon Whiteson,et al. TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning , 2017, ICLR.
[6] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[7] Lawson L. S. Wong,et al. Learning discrete state abstractions with deep variational inference , 2020, ArXiv.
[8] Frans A. Oliehoek,et al. Plannable Approximations to MDP Homomorphisms: Equivariance under Actions , 2020, AAMAS.
[9] Kavosh Asadi,et al. Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning , 2018, ArXiv.
[10] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[11] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[12] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[13] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.
[14] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.
[15] Nicolas Le Roux,et al. A Geometric Perspective on Optimal Representations for Reinforcement Learning , 2019, NeurIPS.
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[18] Shimon Whiteson,et al. Deep Variational Reinforcement Learning for POMDPs , 2018, ICML.
[19] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..
[20] Robert Givan,et al. Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.
[21] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[22] Alborz Geramifard,et al. Reinforcement learning with misspecified model classes , 2013, 2013 IEEE International Conference on Robotics and Automation.
[23] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[24] Joelle Pineau,et al. Learning Causal State Representations of Partially Observable Environments , 2019, ArXiv.
[25] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[26] Marc G. Bellemare,et al. DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.
[27] Sergey Levine,et al. SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning , 2018, ICML.
[28] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[29] Nicolas Le Roux,et al. The Value Function Polytope in Reinforcement Learning , 2019, ICML.
[30] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
[31] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[32] Amir-massoud Farahmand,et al. Iterative Value-Aware Model Learning , 2018, NeurIPS.
[33] Kavosh Asadi,et al. Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning , 2018, ArXiv.
[34] Wulfram Gerstner,et al. Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation , 2018, ICML.
[35] Daniel Nikovski,et al. Value-Aware Loss Function for Model-based Reinforcement Learning , 2017, AISTATS.
[36] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[37] Doina Precup,et al. Bounding Performance Loss in Approximate MDP Homomorphisms , 2008, NIPS.
[38] Joelle Pineau,et al. Combined Reinforcement Learning via Abstract Representations , 2018, AAAI.
[39] Balaraman Ravindran. Approximate Homomorphisms : A framework for non-exact minimization in Markov Decision Processes , 2022 .
[40] Marc G. Bellemare,et al. The Value-Improvement Path: Towards Better Representations for Reinforcement Learning , 2020, ArXiv.
[41] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.
[42] Mohammad Ghavamzadeh,et al. Policy-Aware Model Learning for Policy Gradient Methods , 2020, ArXiv.
[43] Craig Boutilier,et al. Value-Directed Belief State Approximation for POMDPs , 2000, UAI.
[44] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.