Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences
暂无分享,去创建一个
Dorsa Sadigh | Gleb Shevchuk | Dylan P. Losey | Erdem Biyik | Malayandi Palan | Nicholas C. Landolfi | Dorsa Sadigh | Erdem Biyik | Malayandi Palan | Gleb Shevchuk
[1] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[2] Stephen L. Smith,et al. Bayesian Active Learning for Collaborative Task Specification Using Equivalence Regions , 2019, IEEE Robotics and Automation Letters.
[3] Dorsa Sadigh,et al. Asking Easy Questions: A User-Friendly Approach to Active Reward Learning , 2019, CoRL.
[4] Dorsa Sadigh,et al. Batch Active Preference-Based Learning of Reward Functions , 2018, CoRL.
[5] Mykel J. Kochenderfer,et al. Learning an Urban Air Mobility Encounter Model from Expert Preferences , 2019, 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC).
[6] Michèle Sebag,et al. APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.
[7] Joel W. Burdick,et al. ROIAL: Region of Interest Active Learning for Characterizing Exoskeleton Gait Preference Landscapes , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).
[8] Anca D. Dragan,et al. Learning from Physical Human Corrections, One Feature at a Time , 2018, 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[9] Dorsa Sadigh,et al. The Green Choice: Learning and Influencing Human Decisions on Shared Roads , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).
[10] R. Luce,et al. Individual Choice Behavior: A Theoretical Analysis. , 1960 .
[11] Scott Niekum,et al. Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations , 2019, CoRL.
[12] Todd Kulesza,et al. Structured labeling for facilitating concept evolution in machine learning , 2014, CHI.
[13] Matthew Gombolay,et al. Learning from Suboptimal Demonstration via Self-Supervised Reward Regression , 2020, ArXiv.
[14] Anca D. Dragan,et al. On the Utility of Model Learning in HRI , 2019, 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[15] Mukesh Singhal,et al. Do You Want Your Autonomous Car to Drive Like You? , 2015, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.
[16] Shane Legg,et al. Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.
[17] Stefanos Nikolaidis,et al. Efficient Model Learning for Human-Robot Collaborative Tasks , 2014, ArXiv.
[18] Dorsa Sadigh,et al. Active Learning of Reward Dynamics from Hierarchical Queries , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[19] Siddhartha S. Srinivasa,et al. Human preferences for robot-human hand-over configurations , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[20] Prabhat Nagarajan,et al. Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.
[21] K. S. Krishnan. Incorporating Thresholds of Indifference in Probabilistic Choice Models , 1977 .
[22] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[23] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[24] 剛 大北. 会議報告:The 33rd Conference on Neural Information Processing Systems(NeurIPS 2019) , 2020 .
[25] Siddhartha S. Srinivasa,et al. Shared Autonomy via Hindsight Optimization , 2015, Robotics: Science and Systems.
[26] Scott Sanner,et al. Real-time Multiattribute Bayesian Preference Elicitation with Pairwise Comparison Queries , 2010, AISTATS.
[27] Anca D. Dragan,et al. Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.
[28] R. Duncan Luce,et al. Individual Choice Behavior: A Theoretical Analysis , 1979 .
[29] Thomas L. Griffiths,et al. A rational model of preference learning and choice prediction by children , 2008, NIPS.
[30] Jonathan P. How,et al. Bayesian Nonparametric Inverse Reinforcement Learning , 2012, ECML/PKDD.
[31] Maya Cakmak,et al. Keyframe-based Learning from Demonstration , 2012, Int. J. Soc. Robotics.
[32] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[33] Anca D. Dragan,et al. Learning Robot Objectives from Physical Human Interaction , 2017, CoRL.
[34] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[35] Ankit Shah,et al. Interactive Robot Training for Non-Markov Tasks , 2020, ArXiv.
[36] Daniel King,et al. Fetch & Freight : Standard Platforms for Service Robot Applications , 2016 .
[37] Aaron D. Ames,et al. Preference-Based Learning for Exoskeleton Gait Optimization , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).
[38] Nicholas Roy,et al. Inferring Task Goals and Constraints using Bayesian Nonparametric Inverse Reinforcement Learning , 2019, CoRL.
[39] Stefanos Nikolaidis,et al. Efficient Model Learning from Joint-Action Demonstrations for Human-Robot Collaborative Tasks , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[40] Mykel J. Kochenderfer,et al. Bayesian Preference Elicitation for Multiobjective Engineering Design Optimization , 2015, J. Aerosp. Inf. Syst..
[41] Scott Niekum,et al. Deep Bayesian Reward Learning from Preferences , 2019, ArXiv.
[42] Dylan P. Losey,et al. Here’s What I’ve Learned: Asking Questions that Reveal Reward Learning , 2021, ACM Trans. Hum. Robot Interact..
[43] Mykel J. Kochenderfer,et al. Preference-based Learning of Reward Function Features , 2021, ArXiv.
[44] Anca D. Dragan,et al. Learning under Misspecified Objective Spaces , 2018, CoRL.
[45] P. Dayan,et al. Cortical substrates for exploratory decisions in humans , 2006, Nature.
[46] Siddhartha S. Srinivasa,et al. Formalizing Assistive Teleoperation , 2012, Robotics: Science and Systems.
[47] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.
[48] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[49] Dorsa Sadigh,et al. Learning Reward Functions by Integrating Human Demonstrations and Preferences , 2019, Robotics: Science and Systems.
[50] Anca D. Dragan,et al. Planning for Autonomous Cars that Leverage Effects on Human Actions , 2016, Robotics: Science and Systems.
[51] M. L. Fisher,et al. An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..
[52] Wei Chu,et al. Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..
[53] Craig Boutilier,et al. Optimal Bayesian Recommendation Sets and Myopically Optimal Choice Query Sets , 2010, NIPS.
[54] Siddhartha S. Srinivasa,et al. Active Comparison Based Learning Incorporating User Uncertainty and Noise , 2016 .
[55] Mark D. Uncles,et al. Discrete Choice Analysis: Theory and Application to Travel Demand , 1987 .
[56] Dorsa Sadigh,et al. Learning Human Objectives from Sequences of Physical Corrections , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).
[57] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.
[58] Dorsa Sadigh,et al. When Humans Aren’t Optimal: Robots that Collaborate with Risk-Aware Humans , 2020, 2020 15th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[59] Nir Ailon,et al. An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity , 2010, J. Mach. Learn. Res..
[60] Katherine J. Kuchenbecker,et al. Data-Driven Motion Mappings Improve Transparency in Teleoperation , 2015, PRESENCE: Teleoperators and Virtual Environments.
[61] Nima Anari,et al. Batch Active Learning Using Determinantal Point Processes , 2019, ArXiv.
[62] Mykel J. Kochenderfer,et al. Active preference-based Gaussian process regression for reward learning and optimization , 2020, Robotics: Science and Systems.
[63] Moshe Ben-Akiva,et al. Discrete Choice Analysis: Theory and Application to Travel Demand , 1985 .
[64] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[65] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.