Discriminative Apprenticeship Learning with Both Preference and Non-preference Behavior

Considering that expert's demonstrations are usually sub optimal and failed demonstrations often have some useful guidance, in this paper, a Discriminative Apprenticeship Learning algorithm is proposed, where the apprentice is taught with the join of failed attempts to acquire the ability that could discriminate the preference and non-preference cases so that to actively take a corresponding action. Since robot usually encounters changing environments, generalization ability is taken into account in the algorithm through which the reward function is recovered under the evaluation of generalization error. The problem of the representation error is also analyzed and involved in the algorithm. To ensure performance of the algorithm, theoretical guarantee is presented. Experiments on a simple car-driving robot and the comparison with a variety of inverse reinforcement learning methods are performed, which illustrate the proposed method is an effective and promising alternative.