Reward learning from human preferences and demonstrations in Atari
暂无分享,去创建一个
Shane Legg | Jan Leike | Borja Ibarz | Dario Amodei | Tobias Pohlen | Geoffrey Irving | Dario Amodei | Tobias Pohlen | S. Legg | J. Leike | Borja Ibarz | G. Irving
[1] S. Hochreiter,et al. REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .
[2] Jürgen Schmidhuber,et al. Optimal Artificial Curiosity, Creativity, Music, and the Fine Arts , 2005 .
[3] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[4] Tom Everitt,et al. Towards Safe Artificial General Intelligence , 2018 .
[5] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[6] Laurent Orseau,et al. Universal Knowledge-Seeking Agents for Stochastic Environments , 2013, ALT.
[7] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.
[8] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[9] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[10] Peter Stone,et al. Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.
[11] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[12] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[13] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[14] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[15] Guan Wang,et al. Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.
[16] Romain Laroche,et al. Score-based Inverse Reinforcement Learning , 2016, AAMAS.
[17] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[18] Huimin Ma,et al. Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations , 2018, ArXiv.
[19] Shakir Mohamed,et al. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.
[20] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[21] Risto Miikkulainen,et al. The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities , 2018, Artificial Life.
[22] Alan Fern,et al. A Bayesian Approach for Policy Learning from Trajectory Preference Queries , 2012, NIPS.
[23] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[24] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[25] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[27] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[28] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[29] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[30] R. A. Bradley,et al. Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons , 1952 .
[31] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[32] A. Elo. The rating of chessplayers, past and present , 1978 .
[33] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[34] Owain Evans,et al. Trial without Error: Towards Safe Reinforcement Learning via Human Intervention , 2017, AAMAS.
[35] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.
[36] Nando de Freitas,et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.
[37] Michèle Sebag,et al. APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.
[38] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[39] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[40] Johannes Fürnkranz,et al. A Survey of Preference-Based Reinforcement Learning Methods , 2017, J. Mach. Learn. Res..
[41] Johannes Fürnkranz,et al. Model-Free Preference-Based Reinforcement Learning , 2016, AAAI.
[42] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[43] Farbod Fahimi,et al. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning , 2011, 2011 IEEE International Conference on Rehabilitation Robotics.
[44] R. A. Bradley,et al. RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS , 1952 .
[45] Patrick M. Pilarski,et al. Actor-Critic Reinforcement Learning with Simultaneous Human Control and Feedback , 2017, ArXiv.
[46] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[47] Christoph Salge,et al. Empowerment - an Introduction , 2013, ArXiv.
[48] Oliver Kroemer,et al. Active reward learning with a novel acquisition function , 2015, Auton. Robots.
[49] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[50] Johannes Fürnkranz,et al. Preference-Based Reinforcement Learning: A Preliminary Survey , 2013 .
[51] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[52] Peter Stone,et al. Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces , 2017, AAAI.
[53] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[54] Mark O. Riedl,et al. Explore, Exploit or Listen: Combining Human Feedback and Policy Model to Speed up Deep Reinforcement Learning in 3D Worlds , 2017, ArXiv.
[55] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[56] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.
[57] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.