Improved QMDP Policy for Partially Observable Markov Decision Processes in Large Domains: Embedding Exploration Dynamics

Abstract Artificial Intelligence techniques were primarily focused on domains in which at each time the state of the world is known to the system. Such domains can be modeled as a Markov Decision Process (MDP). Action and planning policies for MDPs have been studied extensively and several efficient methods exist. However, in real world problems pieces of information useful for the process of action selection are often missing. The theory of Partially Observable Mazkov Decision Processes (POMDP’s) covers the problem domain in which the full state of the environment is not directly perceivable by the agent. Current algorithms for the exact solution of POMDP’s are only applicable to domains with a small number of states. To cope with more extended state spaces, a number of methods that achieve sub-optimal solutions exist and among these the QI,IDP approach seems to be the best. We introduce a novel technique, called Explorative [Qtilde]P (EQI-IDP) which constitutes an important enhancement of the [Qtilde]P ...

[1]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[2]  Stephen S. Lee,et al.  Planning with Partially Observable Markov Decision Processes: Advances in Exact Solution Method , 1998, UAI.

[3]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[4]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[5]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[6]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[7]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[8]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[9]  Leslie Pack Kaelbling,et al.  Planning With Deadlines in Stochastic Domains , 1993, AAAI.

[10]  N. Zhang,et al.  Algorithms for partially observable markov decision processes , 2001 .

[11]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[12]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[13]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[14]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[15]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[16]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[17]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[18]  R. A. McCallum First Results with Utile Distinction Memory for Reinforcement Learning , 1992 .

[19]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[20]  Spyros G. Tzafestas,et al.  Fuzzy reinforcement learning control for compliance tasks of robotic manipulators , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[21]  Wenju Liu,et al.  A Model Approximation Scheme for Planning in Partially Observable Stochastic Domains , 1997, J. Artif. Intell. Res..

[22]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .