论文信息 - Model-Based Reinforcement Learning in Factored-State MDPs

Model-Based Reinforcement Learning in Factored-State MDPs

We consider the problem of learning in a factored-state Markov decision process that is structured to allow a compact representation. We show that the well-known algorithm, factored Rmax, performs near-optimally on all but a number of timesteps that is polynomial in the size of the compact representation, which is often exponentially smaller than the number of states. This is equivalent to the result obtained by Kearns and Roller for their DBN-E3 algorithm, except that we've conducted the analysis in a more general setting. We also extend the results to a new algorithm, factored IE, that uses the interval estimation approach to exploration and can be expected to outperform factored Rmax on most domains

Alexander L. Strehl | A. Strehl

[1] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.

[2] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[3] Craig Boutilier,et al. Context-Specific Independence in Bayesian Networks , 1996, UAI.

[4] Michael L. Littman,et al. Probabilistic Propositional Planning: Representations and Complexity , 1997, AAAI/IAAI.

[5] Jürgen Schmidhuber,et al. Efficient model-based exploration , 1998 .

[6] Michael L. Littman,et al. The Computational Complexity of Probabilistic Planning , 1998, J. Artif. Intell. Res..

[7] Daphne Koller,et al. Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[8] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[9] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[10] Alexander Russell,et al. A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics , 2002, NIPS.

[11] Dale Schuurmans,et al. Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs , 2002, ICML.