This report describes a generalized apprenticeship learning protocol for reinforcement-learning agents with access to a teacher. The teacher interacts with the agent by providing policy traces (transition and reward observations). We characterize sufficient conditions of the underlying models for efficient apprenticeship learning and link this criteria to two established learnability classes (KWIK and Mistake Bound). We demonstrate our approach in a conjunctive learning task that would be too slow to learn in the autonomous setting. We show that the agent can guarantee near-optimal performance with only a polynomial number of examples from a human teacher and can efficiently learn in real world environments with sensor imprecision and
[1]
Thomas J. Walsh,et al.
Knows what it knows: a framework for self-aware learning
,
2008,
ICML.
[2]
Pieter Abbeel,et al.
Exploration and apprenticeship learning in reinforcement learning
,
2005,
ICML.
[3]
Thomas J. Walsh,et al.
Generalizing Apprenticeship Learning across Hypothesis Classes
,
2010,
ICML.
[4]
N. Littlestone.
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm
,
1987,
28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[5]
Andre Cohen,et al.
An object-oriented representation for efficient reinforcement learning
,
2008,
ICML '08.