A tutorial on partially observable Markov decision processes

Abstract The partially observable Markov decision process (POMDP) model of environments was first explored in the engineering and operations research communities 40 years ago. More recently, the model has been embraced by researchers in artificial intelligence and machine learning, leading to a flurry of solution algorithms that can identify optimal or near-optimal behavior in many environments represented as POMDPs. The purpose of this article is to introduce the POMDP model to behavioral scientists who may wish to apply the framework to the problem of understanding normative behavior in experimental settings. The article includes concrete examples using a publicly-available POMDP solution package.

[1]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[2]  J SondikEdward The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon , 1978 .

[3]  Eric A. Hansen,et al.  An Improved Policy Iteration Algorithm for Partially Observable MDPs , 1997, NIPS.

[4]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[5]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[6]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[7]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[8]  Michael Beetz,et al.  Plan-Based Control of Robotic Agents , 2002, Lecture Notes in Computer Science.

[9]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[10]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[11]  Chelsea C. White,et al.  A survey of solution techniques for the partially observed Markov decision process , 1991, Ann. Oper. Res..

[12]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[13]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[14]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[15]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[16]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[17]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[18]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[19]  Jesse Hoey,et al.  A Decision-Theoretic Approach to Task Assistance for Persons with Dementia , 2005, IJCAI.

[20]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[21]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[22]  Pascal Poupart,et al.  The Advisor-POMDP: A Principled Approach to Trust through Reputation in Electronic Markets , 2005, PST.

[23]  Reid G. Simmons,et al.  Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[24]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[25]  Alexander J. Smola,et al.  Neural Information Processing Systems , 1997, NIPS 1997.

[26]  Shlomo Zilberstein,et al.  Decision-Theoretic Control of Planetary Rovers , 2001, Advances in Plan-Based Control of Robotic Agents.

[27]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[28]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[29]  Chelsea C. White,et al.  Solution Procedures for Partially Observed Markov Decision Processes , 1989, Oper. Res..

[30]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .