Adaptable replanning with compressed linear action models for learning from demonstrations

We propose an adaptable and efficient model-based reinforcement learning approach well suited for continuous domains with sparse samples, a setting often encountered when learning from demonstrations. The flexibility of our method originates from the approximate transition models, estimated from data, and the online replanning approach proposed. Together, these components allow for immediate adaptation to a new task, given in the form of a reward function. The efficiency of our method comes from two approximations. First, rather than representing a complete distribution over the results of taking an action, which is difficult in continuous state spaces, it learns a linear model of the expected transition for each action. Second, it uses a novel strategy for compressing these linear action models, which significantly reduces space and time for learning models, and supports efficient online generation of open-loop plans. The effectiveness of these methods is demonstrated in a simulated driving domain with a 20-dimensional continuous input space.

[1]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[2]  Martha White,et al.  Accelerated Gradient Temporal Difference Learning , 2016, AAAI.

[3]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[4]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[5]  M. Brand,et al.  Fast low-rank modifications of the thin singular value decomposition , 2006 .

[6]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[7]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[8]  Clement Gehring Approximate Linear Successor Representation , 2015 .

[9]  Satinder P. Singh,et al.  Linear options , 2010, AAMAS.

[10]  Le Song,et al.  A unified kernel framework for nonparametric inference in graphical models ] Kernel Embeddings of Conditional Distributions , 2013 .

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  John Shawe-Taylor,et al.  Compressed Conditional Mean Embeddings for Model-Based Reinforcement Learning , 2016, AAAI.

[13]  Xinhua Zhang,et al.  Pseudo-MDPs and factored linear action models , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[14]  Michael L. Littman,et al.  Open-Loop Planning in Large-Scale Stochastic Domains , 2013, AAAI.

[15]  Csaba Szepesvári,et al.  Approximate Policy Iteration with Linear Action Models , 2012, AAAI.

[16]  Michael L. Littman,et al.  Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.

[17]  Guy Lever,et al.  Modelling transition dynamics in MDPs with RKHS embeddings , 2012, ICML.

[18]  Lihong Li,et al.  An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.

[19]  Martha White,et al.  Incremental Truncated LSTD , 2015, IJCAI.

[20]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.