Deep Reinforcement Learning from Human Preferences
暂无分享,去创建一个
Shane Legg | Jan Leike | Dario Amodei | Tom B. Brown | Paul F. Christiano | Miljan Martic | Dario Amodei | S. Legg | P. Christiano | J. Leike | Miljan Martic
[1] Farbod Fahimi,et al. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning , 2011, 2011 IEEE International Conference on Rehabilitation Robotics.
[2] Oliver Kroemer,et al. Active Reward Learning , 2014, Robotics: Science and Systems.
[3] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[4] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[5] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[6] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[7] Stuart Russell. Should We Fear Supersmart Robots? , 2016, Scientific American.
[8] Johannes Fürnkranz,et al. Model-Free Preference-Based Reinforcement Learning , 2016, AAAI.
[9] Oliver Kroemer,et al. Active reward learning with a novel acquisition function , 2015, Auton. Robots.
[10] R. Shepard. Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space , 1957 .
[11] Peter Stone,et al. Learning non-myopically from human-generated reward , 2013, IUI '13.
[12] Christopher D. Manning,et al. Learning Language Games through Interaction , 2016, ACL.
[13] R. A. Bradley,et al. Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons , 1952 .
[14] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[15] Michèle Sebag,et al. Programming by Feedback , 2014, ICML.
[16] Sergey Levine,et al. Generalizing Skills with Semi-Supervised Reinforcement Learning , 2016, ICLR.
[17] R. Duncan Luce,et al. Individual Choice Behavior: A Theoretical Analysis , 1979 .
[18] Michèle Sebag,et al. APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.
[19] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[20] Pieter Abbeel,et al. Third-Person Imitation Learning , 2017, ICLR.
[21] Alan Fern,et al. A Bayesian Approach for Policy Learning from Trajectory Preference Queries , 2012, NIPS.
[22] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[23] Michèle Sebag,et al. Preference-Based Policy Learning , 2011, ECML/PKDD.
[24] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[25] Tom Schaul,et al. Learning from Demonstrations for Real World Reinforcement Learning , 2017, ArXiv.
[26] A. Elo. The rating of chessplayers, past and present , 1978 .
[27] Risi Sebastian,et al. Breeding a diversity of Super Mario behaviors through interactive evolution , 2016 .
[28] Nick Bostrom,et al. Superintelligence: Paths, Dangers, Strategies , 2014 .
[29] Guan Wang,et al. Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.
[30] Romain Laroche,et al. Score-based Inverse Reinforcement Learning , 2016, AAMAS.
[31] I. C. Parmee,et al. INTRODUCING MACHINE LEARNING WITHIN AN INTERACTIVE EVOLUTIONARY DESIGN ENVIRONMENT , 2006 .
[32] R. A. Bradley,et al. RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS , 1952 .
[33] Johannes Fürnkranz,et al. Preference-Based Reinforcement Learning: A Preliminary Survey , 2013 .
[34] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[35] Peter Stone,et al. Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.
[36] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[37] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.
[38] Jimmy Secretan,et al. Picbreeder: evolving pictures collaboratively online , 2008, CHI.
[39] Hiroaki Sugiyama,et al. Preference-learning based Inverse Reinforcement Learning for Dialog Control , 2012, INTERSPEECH.
[40] Nando de Freitas,et al. A Bayesian interactive optimization approach to procedural animation design , 2010, SCA '10.
[41] Eyke Hüllermeier,et al. Preference-based reinforcement learning: a formal framework and a policy iteration algorithm , 2012, Mach. Learn..
[42] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[43] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[44] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.
[45] W. Bradley Knox,et al. Learning from human-generated reward , 2012 .