Maintaining Predictions over Time without a Model

A common approach to the control problem in partially observable environments is to perform a direct search in policy space, as defined over some set of features of history. In this paper we consider predictive features, whose values are conditional probabilities of future events, given history. Since predictive features provide direct information about the agent's future, they have a number of advantages for control. However, unlike more typical features defined directly over past observations, it is not clear how to maintain the values of predictive features over time. A model could be used, since a model can make any prediction about the future, but in many cases learning a model is infeasible. In this paper we demonstrate that in some cases it is possible to learn to maintain the values of a set of predictive features even when a learning a model is infeasible, and that natural predictive features can be useful for policy-search methods.

[1]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[2]  Shlomo Zilberstein,et al.  Finite-memory control of partially observable systems , 1998 .

[3]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[4]  Nicola Muscettola,et al.  Dynamic Control Of Plans With Temporal Uncertainty , 2001, IJCAI.

[5]  Lex Weaver,et al.  The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.

[6]  Satinder Singh Baveja,et al.  On predictive linear gaussian models , 2009 .

[7]  Thomas G. Dietterich,et al.  In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.

[8]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[9]  Alessandro Sperduti,et al.  Learning and Solving Soft Temporal Constraints: An Experimental Study , 2002, CP.

[10]  R. Lathe Phd by thesis , 1988, Nature.

[11]  Peter Stone,et al.  Learning Predictive State Representations , 2003, ICML.

[12]  Richard S. Sutton,et al.  Temporal-Difference Networks , 2004, NIPS.

[13]  Thierry Vidal,et al.  Handling contingency in temporal constraint networks: from consistency to controllabilities , 1999, J. Exp. Theor. Artif. Intell..

[14]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[15]  Charles Lee Isbell,et al.  Looping suffix tree-based inference of partially observable hidden state , 2006, ICML.