论文信息 - Acting Optimally in Partially Observable Stochastic Domains

Acting Optimally in Partially Observable Stochastic Domains

In this paper, we describe the partially observable Markov decision process (POMDP) approach to finding optimal or near-optimal control strategies for partially observable stochastic environments, given a complete model of the environment. The POMDP approach was originally developed in the operations research community and provides a formal basis for planning problems that have been of interest to the AI community. We found the existing algorithms for computing optimal control strategies to be highly computationally inefficient and have developed a new algorithm that is empirically more efficient. We sketch this algorithm and present preliminary results on several small problems that illustrate important properties of the POMDP approach.

[1] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .

[2] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .

[3] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .

[4] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[5] Robert C. Moore. A Formal Theory of Knowledge and Action , 1984 .

[6] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[7] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[8] Ming Tan,et al. Cost-Sensitive Reinforcement Learning for Adaptive Classification and Control , 1991, AAAI.

[9] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[10] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[11] Leslie Pack Kaelbling,et al. Planning With Deadlines in Stochastic Domains , 1993, AAAI.

[12] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[13] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.