Benefits of Assistance over Reward Learning
暂无分享,去创建一个
Stuart J. Russell | P. Abbeel | Rohin Shah | A. Dragan | Dmitrii Krasheninnikov | Michael Dennis | Rachel Freedman | Lawrence Chan | Pedro Freire | P. Freire
[1] P. Randolph. Bayesian Decision Problems and Markov Chains , 1968 .
[2] M. Allais,et al. The So-Called Allais Paradox and Rational Decisions under Uncertainty , 1979 .
[3] John McCarthy,et al. SOME PHILOSOPHICAL PROBLEMS FROM THE STANDPOINT OF ARTI CIAL INTELLIGENCE , 1987 .
[4] John A. List,et al. Preference Learning in Consecutive Experimental Auctions , 2000 .
[5] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[6] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[7] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.
[8] Nando de Freitas,et al. Active Preference Learning with Discrete Choice Data , 2007, NIPS.
[9] Sriraam Natarajan,et al. A Decision-Theoretic Model of Assistance , 2007, IJCAI.
[10] Stephen M. Omohundro,et al. The Basic AI Drives , 2008, AGI.
[11] Emilio Frazzoli,et al. Intention-Aware Motion Planning , 2013, WAFR.
[12] Leslie Pack Kaelbling,et al. POMCoP: Belief Space Planning for Sidekicks in Cooperative Games , 2012, AIIDE.
[13] Joseph Y. Halpern,et al. Game theory with translucent players , 2013, TARK.
[14] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[15] Stefanos Nikolaidis,et al. Human-robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[16] Oliver Kroemer,et al. Active Reward Learning , 2014, Robotics: Science and Systems.
[17] Siddhartha S. Srinivasa,et al. Shared Autonomy via Hindsight Optimization , 2015, Robotics: Science and Systems.
[18] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.
[19] R. Cohn. Maximizing Expected Value of Information in Decision Problems by Querying on a Wish-to-Know Basis. , 2016 .
[20] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[21] Michael C. Frank,et al. Review Pragmatic Language Interpretation as Probabilistic Inference , 2022 .
[22] Fiery Cushman,et al. Showing versus doing: Teaching by demonstration , 2016, NIPS.
[23] Alexandra Chouldechova,et al. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.
[24] Nishant Desai. Uncertain Reward-Transition MDPs for Negotiable Reinforcement Learning , 2017 .
[25] Avi Feller,et al. Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.
[26] Anca D. Dragan,et al. Inverse Reward Design , 2017, NIPS.
[27] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[28] Christos Dimitrakakis,et al. Multi-View Decision Processes: The Helper-AI Problem , 2017, NIPS.
[29] Anca D. Dragan,et al. The Off-Switch Game , 2016, IJCAI.
[30] Johannes Fürnkranz,et al. A Survey of Preference-Based Reinforcement Learning Methods , 2017, J. Mach. Learn. Res..
[31] Matthias Grossglauser,et al. Just Sort It! A Simple and Effective Approach to Active Preference Learning , 2015, ICML.
[32] Anca D. Dragan,et al. Learning Robot Objectives from Physical Human Interaction , 2017, CoRL.
[33] Edmund H. Durfee,et al. Approximately-Optimal Queries for Planning in Reward-Uncertain Markov Decision Processes , 2017, ICAPS.
[34] C. Robert. Superintelligence: Paths, Dangers, Strategies , 2017 .
[35] Anca D. Dragan,et al. Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.
[36] Jon M. Kleinberg,et al. Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.
[37] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.
[38] Dylan Hadfield-Menell,et al. Active Inverse Reward Design , 2018, ArXiv.
[39] Ryan Carey,et al. Incorrigibility in the CIRL Framework , 2017, AIES.
[40] Anca D. Dragan,et al. An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning , 2018, ICML.
[41] J. Clune,et al. The Surprising Creativity of Digital Evolution , 2018, ALIFE.
[42] Stuart Armstrong,et al. Occam's razor is insufficient to infer the preferences of irrational agents , 2017, NeurIPS.
[43] Alexander Matt Turner. Optimal Farsighted Agents Tend to Seek Power , 2019, ArXiv.
[44] Stuart Russell. Human Compatible: Artificial Intelligence and the Problem of Control , 2019 .
[45] Sergey Levine,et al. From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following , 2019, ICLR.
[46] Daniel Szafir,et al. Balanced Information Gathering and Goal-Oriented Actions in Shared Autonomy , 2019, 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[47] Dorsa Sadigh,et al. Asking Easy Questions: A User-Friendly Approach to Active Reward Learning , 2019, CoRL.
[48] Anca D. Dragan,et al. Preferences Implicit in the State of the World , 2018, ICLR.
[49] Anca D. Dragan,et al. On the Utility of Learning about Humans for Human-AI Coordination , 2019, NeurIPS.
[50] Hong Jun Jeon,et al. Reward-rational (implicit) choice: A unifying formalism for reward learning , 2020, NeurIPS.
[51] Karol Hausman,et al. Learning to Interactively Learn and Assist , 2019, AAAI.
[52] D. Kulić,et al. Active Preference Learning using Maximum Regret , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[53] Dylan Hadfield-Menell,et al. Conservative Agency via Attainable Utility Preservation , 2019, AIES.
[54] Laurent Orseau,et al. Pitfalls of learning a reward function online , 2020, IJCAI.