论文信息 - Probabilistic inference for solving discrete and continuous state Markov Decision Processes

Probabilistic inference for solving discrete and continuous state Markov Decision Processes

Inference in Markov Decision Processes has recently received interest as a means to infer goals of an observed action, policy recognition, and also as a tool to compute policies. A particularly interesting aspect of the approach is that any existing inference technique in DBNs now becomes available for answering behavioral question--including those on continuous, factorial, or hierarchical state representations. Here we present an Expectation Maximization algorithm for computing optimal policies. Unlike previous approaches we can show that this actually optimizes the discounted expected future return for arbitrary reward functions and without assuming an ad hoc finite total time. The algorithm is generic in that any inference technique can be utilized in the E-step. We demonstrate this for exact inference on a discrete maze and Gaussian belief state propagation in continuous stochastic optimal control problems.

Marc Toussaint | Amos J. Storkey | Marc Toussaint | A. Storkey

[1] Craig Boutilier,et al. Exploiting Structure in Policy Construction , 1995, IJCAI.

[2] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[3] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[4] Richard S. Sutton,et al. Dimensions of Reinforcement Learning , 1998 .

[5] Daphne Koller,et al. Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[6] Andrew Y. Ng,et al. Policy Search via Density Estimation , 1999, NIPS.

[7] Tom Minka,et al. A family of algorithms for approximate Bayesian inference , 2001 .

[8] Stuart J. Russell,et al. Dynamic bayesian networks: representation, inference and learning , 2002 .

[9] Svetha Venkatesh,et al. Policy Recognition in the Abstract Hidden Markov Model , 2002 .

[10] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[11] Hagai Attias,et al. Planning by Probabilistic Inference , 2003, AISTATS.

[12] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[13] Rajesh P. N. Rao,et al. Goal-Based Imitation as Probabilistic Inference over Graphical Models , 2005, NIPS.

[14] Milos Hauskrecht,et al. An MCMC Approach to Solving Hybrid Factored MDPs , 2005, IJCAI.