Efficient Apprenticeship Learning with Smart Humans

This report describes a generalized apprenticeship learning protocol for reinforcement-learning agents with access to a teacher. The teacher interacts with the agent by providing policy traces (transition and reward observations). We characterize sufficient conditions of the underlying models for efficient apprenticeship learning and link this criteria to two established learnability classes (KWIK and Mistake Bound). We demonstrate our approach in a conjunctive learning task that would be too slow to learn in the autonomous setting. We show that the agent can guarantee near-optimal performance with only a polynomial number of examples from a human teacher and can efficiently learn in real world environments with sensor imprecision and