Model-Based Reinforcement Learning in Factored-State MDPs

We consider the problem of learning in a factored-state Markov decision process that is structured to allow a compact representation. We show that the well-known algorithm, factored Rmax, performs near-optimally on all but a number of timesteps that is polynomial in the size of the compact representation, which is often exponentially smaller than the number of states. This is equivalent to the result obtained by Kearns and Roller for their DBN-E3 algorithm, except that we've conducted the analysis in a more general setting. We also extend the results to a new algorithm, factored IE, that uses the interval estimation approach to exploration and can be expected to outperform factored Rmax on most domains

[1]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[2]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[3]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[4]  Michael L. Littman,et al.  Probabilistic Propositional Planning: Representations and Complexity , 1997, AAAI/IAAI.

[5]  Jürgen Schmidhuber,et al.  Efficient model-based exploration , 1998 .

[6]  Michael L. Littman,et al.  The Computational Complexity of Probabilistic Planning , 1998, J. Artif. Intell. Res..

[7]  Daphne Koller,et al.  Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[8]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[9]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[10]  Alexander Russell,et al.  A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics , 2002, NIPS.

[11]  Dale Schuurmans,et al.  Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs , 2002, ICML.

[12]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[13]  E. Ordentlich,et al.  Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .

[14]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[15]  Michael L. Littman,et al.  An empirical evaluation of interval estimation for Markov decision processes , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[16]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[17]  Michael L. Littman,et al.  A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[19]  Olivier Sigaud,et al.  Learning the structure of Factored Markov Decision Processes in reinforcement learning problems , 2006, ICML.

[20]  Lihong Li,et al.  Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.

[21]  Michael L. Littman,et al.  An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..