论文信息 - Efficient Reinforcement Learning in Factored MDPs

Efficient Reinforcement Learning in Factored MDPs

We present a provably efficient and near-optimal algorithm for reinforcement learning in Markov decision processes (MDPs) whose transition model can be factored as a dynamic Bayesian network (DBN). Our algorithm generalizes the recent E3 algorithm of Kearns and Singh, and assumes that we are given both an algorithm for approximate planning, and the graphical structure (but not the parameters) of the DBN. Unlike the original E algorithm, our new algorithm exploits the DBN structure to achieve a running time that scales polynomially in the number of parameters of the DBN, which may be exponentially smaller than the number of global states.

Michael Kearns | Daphne Koller | M. Kearns | D. Koller

[1] R. Dobrushin. Central Limit Theorem for Nonstationary Markov Chains. II , 1956 .

[2] T. Lindvall. Lectures on the Coupling Method , 1992 .

[3] Mark Jerrum,et al. Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..

[4] Stuart J. Russell,et al. The BATmobile: Towards a Bayesian Automated Taxi , 1995, IJCAI.

[5] Kee-Eung Kim,et al. Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[6] Xavier Boyen,et al. Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[7] Daphne Koller,et al. Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[8] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..