论文信息 - Approximating Optimal Policies for Partially Observable Stochastic Domains

Approximating Optimal Policies for Partially Observable Stochastic Domains

The problem of making optimal decisions in uncertain conditions is central to Artificial Intelligence If the state of the world is known at all times, the world can be modeled as a Markov Decision Process (MDP) MDPs have been studied extensively and many methods are known for determining optimal courses of action or policies. The more realistic case where state information is only partially observable Partially Observable Markov Decision Processes (POMDPs) have received much less attention. The best exact algorithms for these problems can be very inefficient in both space and time. We introduce Smooth Partially Observable Value Approximation (SPOVA), a new approximation method that can quickly yield good approximations which can improve over time. This method can be combined with reinforcement learning meth ods a combination that was very effective in our test cases.

Stuart J. Russell | Ronald Parr | Ronald E. Parr

[1] R. Bellman. Dynamic programming. , 1957, Science.

[2] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[3] PITTSBURGH , 1980, Bird Student.

[4] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[5] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[6] Long Lin,et al. Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[7] Thomas Martinetz,et al. 'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[8] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[9] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[10] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[11] Stuart J. Russell,et al. Adaptive Probabilistic Networks , 1994 .

[12] M. Littman. The Witness Algorithm: Solving Partially Observable Markov Decision Processes , 1994 .

[13] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[14] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.