Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994, Wiley Series in Probability and Statistics.
 Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
 Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
 Rémi Coulom,et al. Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength , 2008, Computers and Games.
 Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
 Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
 Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
 Thomas B. Schön,et al. From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.
 Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
 Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
 Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.
 Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
 Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
 Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
 Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.
 Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2017, ICLR.
 Shimon Whiteson,et al. TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning , 2017, ICLR 2018.
 Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
 Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2018, J. Artif. Intell. Res..
 Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
 Rémi Munos,et al. Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.
 Jürgen Schmidhuber,et al. Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.
 Weitang Liu,et al. Surprising Negative Results for Generative Adversarial Tree Search , 2018 .
 Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.
 Mike Preuss,et al. Planning chemical syntheses with deep neural networks and symbolic AI , 2018, Nature.
 Fabio Viola,et al. Learning and Querying Fast Generative Models for Reinforcement Learning , 2018, ArXiv.
 Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
 Marc G. Bellemare,et al. DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.
 Matteo Hessel,et al. When to use parametric models in reinforcement learning? , 2019, NeurIPS.
 Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
 Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2019, ICLR.
 Karen Simonyan,et al. Off-Policy Actor-Critic with Shared Experience Replay , 2020, ICML.