论文信息 - Partially Observable Markov Decision Processes for Artificial Intelligence

Partially Observable Markov Decision Processes for Artificial Intelligence

In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. In many cases, we have developed new ways of viewing the problem that are, perhaps, more consistent with the AI perspective. We begin by introducing the theory of Markov decision processes (MDPs) and partially observable Markov decision processes POMDPs. We then outline a novel algorithm for solving POMDPs off line and show how, in many cases, a finite-memory controller can be extracted from the solution to a POMDP. We conclude with a simple example.

[1] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[2] Leslie Pack Kaelbling,et al. Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..

[3] Chelsea C. White,et al. A survey of solution techniques for the partially observed Markov decision process , 1991, Ann. Oper. Res..

[4] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[5] P. Tseng. Solving H-horizon, stationary Markov decision problems in time proportional to log(H) , 1990 .

[6] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .

[7] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[8] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[9] G. Monahan. State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[10] M. Littman. The Witness Algorithm: Solving Partially Observable Markov Decision Processes , 1994 .

[11] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .