论文信息 - Repeated Inverse Reinforcement Learning - 字舞流文

Repeated Inverse Reinforcement Learning

We introduce a novel repeated Inverse Reinforcement Learning problem: the agent has to act on behalf of a human in a sequence of tasks and wishes to minimize the number of tasks that it surprises the human by acting suboptimally with respect to how the human would have acted. Each time the human is surprised, the agent is provided a demonstration of the desired behavior by the human. We formalize this problem, including how the sequence of tasks is chosen, in a few different ways and provide some foundational results.

Nan Jiang | Kareem Amin | Satinder P. Singh | Satinder Singh | Nan Jiang | Kareem Amin

[1] L. Lovász,et al. Geometric Algorithms and Combinatorial Optimization , 1981 .

[2] Daphne Koller,et al. Making Rational Decisions Using Adaptive Utility Elicitation , 2000, AAAI/IAAI.

[3] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[4] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[5] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[6] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[7] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.

[8] Robert E. Schapire,et al. A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[9] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[10] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[11] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[12] Pieter Abbeel,et al. Learning for control from multiple demonstrations , 2008, ICML '08.

[13] Craig Boutilier,et al. Regret-based Reward Elicitation for Markov Decision Processes , 2009, UAI.

[14] Craig Boutilier,et al. Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies , 2010, AAAI.

[15] Christos Dimitrakakis,et al. Preference elicitation and inverse reinforcement learning , 2011, ECML/PKDD.

[16] Craig Boutilier,et al. Eliciting Additive Reward Functions for Markov Decision Processes , 2011, IJCAI.

[17] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[18] Stuart J. Russell,et al. Research Priorities for Robust and Beneficial Artificial Intelligence , 2015, AI Mag..

[19] Kareem Amin,et al. Towards Resolving Unidentifiability in Inverse Reinforcement Learning , 2016, ArXiv.

[20] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[21] S. Schneider. Science fiction and philosophy : from time travel to superintelligence , 2016 .

[22] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.