Pitfalls of learning a reward function online
暂无分享,去创建一个
Laurent Orseau | Shane Legg | Stuart Armstrong | Jan Leike | S. Legg | S. Armstrong | Laurent Orseau | J. Leike
[1] Kareem Amin,et al. Towards Resolving Unidentifiability in Inverse Reinforcement Learning , 2016, ArXiv.
[2] Marcus Hutter. Simulation Algorithms for Computational Systems Biology , 2017, Texts in Theoretical Computer Science. An EATCS Series.
[3] Editors , 1986, Brain Research Bulletin.
[4] Farbod Fahimi,et al. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning , 2011, 2011 IEEE International Conference on Rehabilitation Robotics.
[5] Michèle Sebag,et al. APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.
[6] Angelo C. Loula,et al. Language Evolution and Robotics: Issues on Symbol Grounding and Language Acquisition , 2006 .
[7] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.
[8] Robert Riener,et al. Rehabilitation Robotics , 2013, Found. Trends Robotics.
[9] Shane Legg,et al. Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.
[10] Stuart Armstrong,et al. Good and safe uses of AI Oracles , 2017, ArXiv.
[11] Anca D. Dragan,et al. Inverse Reward Design , 2017, NIPS.
[12] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[13] Guan Wang,et al. Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.
[14] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[15] Kee-Eung Kim,et al. Inverse Reinforcement Learning in Partially Observable Environments , 2009, IJCAI.
[16] Marcus Hutter,et al. Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective , 2019, Synthese.
[17] R. Lathe. Phd by thesis , 1988, Nature.
[18] I-Ping Chen,et al. Design Aspects of Scoring Systems in Game , 2017 .
[19] Tom Everitt,et al. Towards Safe Artificial General Intelligence , 2018 .
[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[21] Illtyd Trethowan. Causality , 1938 .
[22] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[23] Marcus Hutter,et al. Avoiding Wireheading with Value Reinforcement Learning , 2016, AGI.
[24] D. Kahneman. Thinking, Fast and Slow , 2011 .
[25] Laurent Orseau,et al. AI Safety Gridworlds , 2017, ArXiv.
[26] John Salvatier,et al. Agent-Agnostic Human-in-the-Loop Reinforcement Learning , 2017, ArXiv.
[27] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[28] Wray L. Buntine. Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..
[29] Eliezer Yudkowsky. Artificial Intelligence as a Positive and Negative Factor in Global Risk , 2006 .