An online algorithm for discovery and learning of prediction state represen-tations

Predictive state representations (PSRs) are a recently proposed method of modelling discrete dynamical systems using predictions about future observations. The strength of PSRs comes from their ability to represent system state using only observable data, such as actions and observations. Current techniques for learning PSRs use Monte Carlo methods to estimate prediction probabilities, but do not take advantage of the structure of the data to extrapolate information. In this work, we present the constrained gradient algorithm, a new technique for discovery and learning of PSRs that constrains its estimated predictions to augment a gradient descent approach. This algorithm is also the first online algorithm for PSRs capable of discovering core tests. We test the algorithm on a variety of standard domains, and show that it is able to build models competitive with current techniques. This work is an extension and elaboration of published work [McCracken and Bowling, 2006].

[1]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[2]  Matthew Brand,et al.  Fast Online SVD Revisions for Lightweight Recommender Systems , 2003, SDM.

[3]  Kenneth S. Berenhaut,et al.  Advanced Calculus with Applications in Statistics , 2004 .

[4]  Peter Stone,et al.  Learning Predictive State Representations , 2003, ICML.

[5]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[6]  Nicholas Kushmerick,et al.  An Algorithm for Probabilistic Planning , 1995, Artif. Intell..

[7]  Michael R. James,et al.  Planning in Models that Combine Memory with Predictive Representations of State , 2005, AAAI.

[8]  Richard S. Sutton,et al.  Temporal-Difference Networks with History , 2005, IJCAI.

[9]  Michael R. James,et al.  Learning predictive state representations in dynamical systems without reset , 2005, ICML.

[10]  Andrew McCallum,et al.  Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[11]  H. Jaeger Discrete-time, discrete-valued observable operator models: a tutorial , 2003 .

[12]  Richard S. Sutton,et al.  Temporal Abstraction in TD Networks , 2005 .

[13]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[14]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[15]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[16]  Ronald L. Rivest,et al.  Diversity-based inference of finite automata , 1994, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[17]  Satinder P. Singh,et al.  A Nonlinear Predictive State Representation , 2003, NIPS.

[18]  Ross B. Corotis,et al.  INSPECTION, MAINTENANCE, AND REPAIR WITH PARTIAL OBSERVABILITY , 1995 .

[19]  Eric Wiewiora,et al.  Learning predictive representations from a history , 2005, ICML.

[20]  Sebastian Thrun,et al.  Learning low dimensional predictive representations , 2004, ICML.

[21]  Michael R. James,et al.  Learning and discovery of predictive state representations in dynamical systems with reset , 2004, ICML.

[22]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[23]  Richard S. Sutton,et al.  Temporal-Difference Networks , 2004, NIPS.

[24]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[25]  Stuart J. Russell,et al.  Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.

[26]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.