Solving partially observable problems with inaccurate PSR models

Modeling dynamical systems is a commonly used technique to solve partially observable problems in the artificial intelligence field. Predictive state representations (PSRs) have been proposed as an alternative to partially observable Markov decision processes (POMDPs) to model dynamical systems. Although POMDPs and PSRs provide general frameworks to solve partially observable problems, they rely heavily on a known and accurate model of the environment. However, in real world applications it is extremely difficult to build an accurate model. In this paper, we propose an algorithm to solve partially observable problems using an inaccurate PSR model which is learned from samples. The proposed algorithm can also improve the accuracy of the learned model. Given the inaccurate PSR model, the PSR state is identified firstly. Then the traditional Markov decision processes (MDP) techniques are used to solve the partially observable problem. Furthermore, the learned model, which may get off-track as often happens when the model is learned from samples, can be reset. The effectiveness of our proposed algorithm is demonstrated based on a standard set of POMDP test problems.

[1]  Deb Roy,et al.  Connecting language to the world , 2005, Artif. Intell..

[2]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[3]  Michael R. James,et al.  Combining Memory and Landmarks with Predictive State Representations , 2005, IJCAI.

[4]  Michael R. James,et al.  Learning predictive state representations in dynamical systems without reset , 2005, ICML.

[5]  Nikos A. Vlassis,et al.  Improving Approximate Value Iteration Using Memories and Predictive State Representations , 2006, AAAI.

[6]  Michael H. Bowling,et al.  Online Discovery and Learning of Predictive State Representations , 2005, NIPS.

[7]  Michael L. Littman,et al.  Planning with predictive state representations , 2004, 2004 International Conference on Machine Learning and Applications, 2004. Proceedings..

[8]  Michael R. James,et al.  Learning and discovery of predictive state representations in dynamical systems with reset , 2004, ICML.

[9]  Liu Yun-long,et al.  Discovery and learning of models with predictive state representations for dynamical systems without reset , 2009 .

[10]  Caro Lucas,et al.  A general computational recognition primed decision model with multi-agent rescue simulation benchmark , 2012, Inf. Sci..

[11]  Joelle Pineau,et al.  Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[12]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2011, Int. J. Robotics Res..

[13]  Reda Alhajj,et al.  Development of multidimensional academic information networks with a novel data cube based modeling method , 2014, Inf. Sci..

[14]  Xiang Feng,et al.  Behavioral modeling with the new bio-inspired coordination generalized molecule model algorithm , 2013, Inf. Sci..

[15]  R. Andrew McCallum,et al.  Hidden state and reinforcement learning with instance-based state identification , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[16]  Satinder Singh Baveja,et al.  Using predictions for planning and modeling in stochastic environments , 2005 .

[17]  Erik Talvitie,et al.  Learning to Make Predictions In Partially Observable Environments Without a Generative Model , 2011, J. Artif. Intell. Res..

[18]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[19]  Monica Dinculescu,et al.  Approximate Predictive Representations of Partially Observable Systems , 2010, ICML.

[20]  Doina Precup,et al.  Point-Based Planning for Predictive State Representations , 2008, Canadian Conference on AI.

[21]  Jun Yu,et al.  Semantic preserving distance metric learning and applications , 2014, Inf. Sci..

[22]  Joelle Pineau,et al.  Modelling Sparse Dynamical Systems with Compressed Predictive State Representations , 2013, ICML.

[23]  Michael R. James,et al.  Planning in Models that Combine Memory with Predictive Representations of State , 2005, AAAI.

[24]  Michael H. Bowling,et al.  Learning predictive state representations using non-blind policies , 2006, ICML '06.

[25]  Erik Talvitie,et al.  Learning Partially Observable Models Using Temporally Abstract Decision Trees , 2012, NIPS.

[26]  Peter Stone,et al.  Learning Predictive State Representations , 2003, ICML.

[27]  Eva Onaindia,et al.  Context-Aware Multi-Agent Planning in intelligent environments , 2013, Inf. Sci..

[28]  Manuel Graña,et al.  Undesired state-action prediction in multi-agent reinforcement learning for linked multi-component robotic system control , 2013, Inf. Sci..

[29]  Joelle Pineau,et al.  Active Learning in Partially Observable Markov Decision Processes , 2005, ECML.

[30]  Meng Wang,et al.  Semisupervised Multiview Distance Metric Learning for Cartoon Synthesis , 2012, IEEE Transactions on Image Processing.

[31]  Meng Wang,et al.  Adaptive Hypergraph Learning and its Application in Image Classification , 2012, IEEE Transactions on Image Processing.

[32]  Doina Precup,et al.  A Planning Algorithm for Predictive State Representations , 2003, IJCAI.

[33]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[34]  J. K. Satia,et al.  Markovian Decision Processes with Uncertain Transition Probabilities , 1973, Oper. Res..

[35]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[36]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[37]  Joelle Pineau,et al.  A formal framework for robot learning and control under model uncertainty , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[38]  Jun Yu,et al.  Complex Object Correspondence Construction in Two-Dimensional Animation , 2011, IEEE Transactions on Image Processing.

[39]  Erik Talvitie,et al.  Simple Local Models for Complex Dynamical Systems , 2008, NIPS.

[40]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[41]  Ness B. Shroff,et al.  Markov decision processes with uncertain transition rates: sensitivity and robust control , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[42]  Shie Mannor,et al.  Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..

[43]  B. Manly Multivariate Statistical Methods : A Primer , 1986 .