Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1959, IBM J. Res. Dev..
 Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..
 Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
 Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
 Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
 Learning from delayed rewards , 1989 .
 Terrence J. Sejnowski,et al. Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.
 Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
 Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
 Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
 Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
 Inman Harvey,et al. Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics , 1995, ECAL.
 Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
 John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
 Richard S. Sutton,et al. Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .
 Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.
 Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
 Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
 Thomas G. Dietterich. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.
 Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .
 Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
 Jay H. Lee,et al. Model predictive control: past, present and future , 1999 .
 Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
 Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
 Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
 Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
 Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
 Clay B Holroyd,et al. The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. , 2002, Psychological review.
 Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
 Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.
 Remco R. Bouckaert,et al. Choosing Between Two Learning Algorithms Based on Calibrated Tests , 2003, ICML.
 Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.
 Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
 Eibe Frank,et al. Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms , 2004, PAKDD.
 Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.
 Anja Vogler,et al. An Introduction to Multivariate Statistical Analysis , 2004 .
 Colin Camerer,et al. Neuroeconomics: How Neuroscience Can Inform Economics , 2005 .
 R. J. Williams. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 1992, Machine Learning.
 A. Barto,et al. An algebraic approach to abstraction in reinforcement learning , 2004 .
 Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
 Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
 Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
 Richard S. Sutton,et al. Learning to Predict by the Methods of Temporal Differences , 1988, Machine Learning.
 Olivier Teytaud,et al. Modiﬁcation of UCT with Patterns in Monte-Carlo Go , 2006 .
 Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..
 Angela J. Yu,et al. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.
 Csaba Szepesvári,et al. Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.
 Michael L. Littman,et al. Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.
 P. Dayan,et al. Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.
 P. Dayan,et al. Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.
 Marek Petrik,et al. Biasing Approximate Dynamic Programming with a Lower Discount Factor , 2008, NIPS.
 Shimon Whiteson,et al. Automatic Feature Selection for Model-Based Reinforcement Learning in Factored MDPs , 2009, 2009 International Conference on Machine Learning and Applications.
 Pascal Vincent,et al. Visualizing Higher-Layer Features of a Deep Network , 2009 .
 Brian Tanner,et al. RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments , 2009, J. Mach. Learn. Res..
 Theoretical and Empirical Studies of Learning , 2009 .
 Monica Dinculescu,et al. Approximate Predictive Representations of Partially Observable Systems , 2010, ICML.
 Masashi Sugiyama,et al. Nonparametric Return Distribution Approximation for Reinforcement Learning , 2010, ICML.
 Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.
 Shimon Whiteson,et al. Protecting against evaluation overfitting in empirical reinforcement learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
 Rémi Munos,et al. Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..
 Yi Sun,et al. Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments , 2011, AGI.
 Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
 Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
 Ian D. Watson,et al. Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft:Broodwar , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).
 H. Seo,et al. Neural basis of reinforcement learning and decision making. , 2012, Annual review of neuroscience.
 Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
 Anton Nekrutenko,et al. Ten Simple Rules for Reproducible Computational Research , 2013, PLoS Comput. Biol..
 Louis Wehenkel,et al. Batch mode reinforcement learning based on the synthesis of artificial trajectories , 2013, Ann. Oper. Res..
 Pieter Abbeel,et al. Learning from Demonstrations Through the Use of Non-rigid Registration , 2013, ISRR.
 Qiang Yang,et al. Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.
 Kris K. Hauser,et al. Artificial intelligence framework for simulating clinical decision-making: A Markov decision process approach , 2013, Artif. Intell. Medicine.
 Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
 Sergey Levine,et al. Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.
 Christoph Salge,et al. Changing the Environment Based on Empowerment as Intrinsic Motivation , 2014, Entropy.
 Ronald Ortner,et al. Selecting Near-Optimal Approximate State Representations in Reinforcement Learning , 2014, ALT.
 Matthew E. Taylor,et al. Multi-objectivization of reinforcement learning problems by reward shaping , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).
 R. Dolan,et al. Does temporal discounting explain unhealthy behavior? A systematic review and reinforcement learning perspective , 2014, Front. Behav. Neurosci..
 Nan Jiang,et al. Abstraction Selection in Model-based Reinforcement Learning , 2015, ICML.
 Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
 Thomas B. Schön,et al. From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.
 Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
 Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
 D. Curran‐Everett,et al. The fickle P value generates irreproducible results , 2015, Nature Methods.
 Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
 Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.
 Shakir Mohamed,et al. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.
 Damien Ernst,et al. How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies , 2015, ArXiv.
 Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
 Jianfeng Gao,et al. Recurrent Reinforcement Learning: A Hybrid Approach , 2015, ArXiv.
 Zoran Popovic,et al. Interactive Control of Diverse Complex Characters with Neural Networks , 2015, NIPS.
 Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
 Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
 Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
 Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
 Alborz Geramifard,et al. RLPy: a value-function-based reinforcement learning framework for education and research , 2015, J. Mach. Learn. Res..
 Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
 Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.
 Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
 Tianqi Chen,et al. Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.
 Yann LeCun,et al. Deep multi-scale video prediction beyond mean square error , 2015, ICLR.
 Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2016, ICLR.
 Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
 Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
 Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.
 David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
 Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
 Alex Graves,et al. Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.
 Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
 Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.
 Honglak Lee,et al. Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.
 Sergey Levine,et al. Adapting Deep Visuomotor Representations with Weak Pairwise Constraints , 2015, WAFR.
 Damien Ernst,et al. Deep Reinforcement Learning Solutions for Energy Microgrids Management , 2016 .
 Jürgen Schmidhuber,et al. A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.
 Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
 Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
 Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
 Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
 Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
 Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).
 Katja Hofmann,et al. The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.
 Quoc V. Le,et al. Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.
 Li Li,et al. Traffic signal timing via deep reinforcement learning , 2016, IEEE/CAA Journal of Automatica Sinica.
 Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
 Florian Richoux,et al. TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games , 2016, ArXiv.
 Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
 Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
 Sergey Levine,et al. Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.
 Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
 Razvan Pascanu,et al. A simple neural network module for relational reasoning , 2017, NIPS.
 Marc G. Bellemare,et al. The Reactor: A Sample-Efficient Actor-Critic Architecture , 2017, ArXiv.
 Damien Ernst,et al. Approximate Bayes Optimal Policy Search using Neural Networks , 2017, ICAART.
 Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
 Louis Wehenkel,et al. Machine learning of real-time power systems reliability management response , 2017, 2017 IEEE Manchester PowerTech.
 Li Fei-Fei,et al. Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
 Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.
 Shie Mannor,et al. A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.
 Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.
 Learning Robust Dialog Policies in Noisy Environments , 2017, ArXiv.
 Chris Sauer,et al. Beating Atari with Natural Language Guided Reinforcement Learning , 2017, ArXiv.
 Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
 Razvan Pascanu,et al. Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.
 Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
 Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2017, ICLR.
 Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
 Yuandong Tian,et al. ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.
 Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
 Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
 Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
 Tom Schaul,et al. StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.
 Guan Wang,et al. Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.
 Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.
 Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
 Sebastian Ruder,et al. An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.
 Cewu Lu,et al. Virtual to Real Reinforcement Learning for Autonomous Driving , 2017, BMVC.
 Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
 Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
 Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.
 Emma Brunskill,et al. Sample Efficient Feature Selection for Factored MDPs , 2017, ArXiv.
 Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
 Dileep George,et al. Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.
 Jun Wang,et al. Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.
 Glen Berseth,et al. DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning , 2017, ACM Trans. Graph..
 Christopher Burgess,et al. DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.
 Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.
 Tsuneo Kato,et al. “Re:ROS”: Prototyping of Reinforcement Learning Environment for Asynchronous Cognitive Architecture , 2017, BICA 2017.
 Yuandong Tian,et al. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.
 Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
 Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
 Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
 Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
 Gregory Dudek,et al. Benchmark Environments for Multitask Learning in Continuous Domains , 2017, ArXiv.
 Samy Bengio,et al. Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.
 Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2017, ICLR.
 Shimon Whiteson,et al. TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning , 2017, ICLR 2018.
 Peter Henderson,et al. Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.
 Guillaume Lample,et al. Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.
 Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
 Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.
 Youyong Kong,et al. Deep Direct Reinforcement Learning for Financial Signal Representation and Trading , 2017, IEEE Transactions on Neural Networks and Learning Systems.
 Dopamine , 2018, Reactions Weekly.
 Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
 Daniel L. K. Yamins,et al. Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.
 Nando de Freitas,et al. Intrinsic Social Motivation via Causal Influence in Multi-Agent RL , 2018, ArXiv.
 Samy Bengio,et al. A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.
 Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.
 Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.
 Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.
 Sergio Gomez Colmenarejo,et al. One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL , 2018, ArXiv.
 Sergey Levine,et al. Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
 Xiaohui Ye,et al. Horizon: Facebook's Open Source Applied Reinforcement Learning Platform , 2018, ArXiv.
 Hyrum S. Anderson,et al. The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation , 2018, ArXiv.
 Joelle Pineau,et al. A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning , 2018, ArXiv.
 Yee Whye Teh,et al. An Analysis of Categorical Distributional Reinforcement Learning , 2018, AISTATS.
 Peter Stone,et al. Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces , 2017, AAAI.
 Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning , 2018, AAMAS.
 Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
 Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
 Marcin Andrychowicz,et al. Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.
 Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
 Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2018, ICLR.
 Joelle Pineau,et al. Decoupling Dynamics and Reward for Transfer Learning , 2018, ICLR.
 Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
 Wojciech Czarnecki,et al. Multi-task Deep Reinforcement Learning with PopArt , 2019, AAAI.
 Yevgen Chebotar,et al. Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).
 Joelle Pineau,et al. Combined Reinforcement Learning via Abstract Representations , 2019, AAAI.
 Damien Ernst,et al. On overfitting and asymptotic bias in batch reinforcement learning with partial observability , 2019, J. Artif. Intell. Res..
 Nando de Freitas,et al. Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2019, ICML.