Exploring compact reinforcement-learning representations with linear regression

This paper presents a new algorithm for online linear regression whose efficiency guarantees satisfy the requirements of the KWIK (Knows What It Knows) framework. The algorithm improves on the complexity bounds of the current state-of-the-art procedure in this setting. We explore several applications of this algorithm for learning compact reinforcement-learning representations. We show that KWIK linear regression can be used to learn the reward function of a factored MDP and the probabilities of action outcomes in Stochastic STRIPS and Object Oriented MDPs, none of which have been proven to be efficiently learnable in the RL setting before. We also combine KWIK linear regression with other KWIK learners to learn larger portions of these models, including experiments on learning factored MDP transition and reward functions together.

[1]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[2]  Claude-Nicolas Fiechter,et al.  PAC adaptive control of linear systems , 1997, COLT '97.

[3]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[4]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[5]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[6]  S. Geer Applications of empirical process theory , 2000 .

[7]  Peter Auer,et al.  An Improved On-line Algorithm for Learning Linear Evaluation Functions , 2000, COLT.

[8]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[9]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[10]  Justin A. Boyan,et al.  Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[11]  Håkan L. S. Younes,et al.  The First Probabilistic Track of the International Planning Competition , 2005, J. Artif. Intell. Res..

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Peter Stone,et al.  Improving Action Selection in MDP's via Knowledge Transfer , 2005, AAAI.

[14]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[15]  Michael L. Littman,et al.  Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.

[16]  Michael L. Littman,et al.  Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.

[17]  L. P. Kaelbling,et al.  Learning Symbolic Models of Stochastic Domains , 2007, J. Artif. Intell. Res..

[18]  Thomas J. Walsh,et al.  Knows what it knows: a framework for self-aware learning , 2008, ICML '08.

[19]  András Lörincz,et al.  The many faces of optimism: a unifying approach , 2008, ICML '08.

[20]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[21]  Lihong Li,et al.  The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.

[22]  Michael L. Littman,et al.  A unifying framework for computational reinforcement learning theory , 2009 .