Bayesian Multitask Inverse Reinforcement Learning

We generalise the problem of inverse reinforcement learning to multiple tasks, from multiple demonstrations. Each one may represent one expert trying to solve a different task, or as different experts trying to solve the same task. Our main contribution is to formalise the problem as statistical preference elicitation, via a number of structured priors, whose form captures our biases about the relatedness of different tasks or expert policies. In doing so, we introduce a prior on policy optimality, which is more natural to specify. We show that our framework allows us not only to learn to efficiently from multiple experts but to also effectively differentiate between the goals of each. Possible applications include analysing the intrinsic motivations of subjects in behavioural experiments and learning from multiple teachers.

[1]  H. Robbins An Empirical Bayes Approach to Statistics , 1956 .

[2]  T. Ferguson Prior Distributions on Spaces of Probability Measures , 1974 .

[3]  J. Geweke,et al.  Bayesian Inference in Econometric Models Using Monte Carlo Integration , 1989 .

[4]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5]  Tom Heskes,et al.  Solving a Huge Number of Similar Tasks: A Combination of Multi-Task Learning and a Hierarchical Bayesian Approach , 1998, ICML.

[6]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[7]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[8]  Craig Boutilier,et al.  A POMDP formulation of preference elicitation problems , 2002, AAAI/IAAI.

[9]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[10]  Wei Chu,et al.  Preference learning with Gaussian processes , 2005, ICML.

[11]  Robert E. Schapire,et al.  A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[12]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[13]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[14]  Pieter Abbeel,et al.  Learning for control from multiple demonstrations , 2008, ICML '08.

[15]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[16]  Tom Heskes,et al.  Multi-task Preference learning with Gaussian Processes , 2009, ESANN.

[17]  Kee-Eung Kim,et al.  Inverse Reinforcement Learning in Partially Observable Environments , 2009, IJCAI.

[18]  Alessandro Lazaric,et al.  Bayesian Multi-Task Reinforcement Learning , 2010, ICML.

[19]  Joshua B. Tenenbaum,et al.  Nonparametric Bayesian Policy Priors for Reinforcement Learning , 2010, NIPS.

[20]  Kristian Kersting,et al.  Multi-Agent Inverse Reinforcement Learning , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[21]  Anind K. Dey,et al.  Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[22]  Christos Dimitrakakis,et al.  Preference elicitation and inverse reinforcement learning , 2011, ECML/PKDD.

[23]  Christos Dimitrakakis,et al.  Robust Bayesian Reinforcement Learning through Tight Lower Bounds , 2011, EWRL.

[24]  Michael L. Littman,et al.  Apprenticeship Learning About Multiple Intentions , 2011, ICML.