Agnostic System Identification for Monte Carlo Planning

While model-based reinforcement learning is often studied under the assumption that a fully accurate model is contained within the model class, this is rarely true in practice. When the model class may be fundamentally limited, it can be difficult to obtain theoretical guarantees. Under some conditions the DAgger algorithm promises a policy nearly as good as the plan obtained from the most accurate model in the class, but only if the planning algorithm is near-optimal, which is also rarely the case in complex problems. This paper explores the interaction between DAgger and Monte Carlo planning, specifically showing that DAgger may perform poorly when coupled with a sub-optimal planner. A novel variation of DAgger specifically for use with Monte Carlo planning is derived and is shown to behave far better in some cases where DAgger fails.

[1]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[2]  Lihong Li,et al.  An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.

[3]  Alborz Geramifard,et al.  Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.

[4]  Alex Simpkins,et al.  System Identification: Theory for the User, 2nd Edition (Ljung, L.; 1999) [On the Shelf] , 2012, IEEE Robotics & Automation Magazine.

[5]  Erik Talvitie,et al.  Model Regularization for Stable Sample Rollouts , 2014, UAI.

[6]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[7]  Alborz Geramifard,et al.  Reinforcement learning with misspecified model classes , 2013, 2013 IEEE International Conference on Robotics and Automation.

[8]  Richard L. Lewis,et al.  Reward Design via Online Gradient Ascent , 2010, NIPS.

[9]  Ralf Schoknecht,et al.  Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.

[10]  Joel Veness,et al.  Context Tree Switching , 2011, 2012 Data Compression Conference.

[11]  Joel Veness,et al.  A Monte-Carlo AIXI Approximation , 2009, J. Artif. Intell. Res..

[12]  Rémi Munos,et al.  Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..

[13]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[14]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[15]  J. Andrew Bagnell,et al.  Agnostic System Identification for Model-Based Reinforcement Learning , 2012, ICML.