Exponential Family Predictive Representations of State

Many agent-environment interactions can be framed as dynamical systems in which agents take actions and receive observations. These dynamical systems are diverse, representing such things as a biped walking, a stock price changing over time, the trajectory of a missile, or the shifting fish population in a lake. Often, interacting successfully with the environment requires the use of a model, which allows the agent to predict something about the future by summarizing the past. Two of the basic problems in modeling partially observable dynamical systems are selecting a representation of state and selecting a mechanism for maintaining that state. This thesis explores both problems from a learning perspective: we are interested in learning a predictive model directly from the data that arises as an agent interacts with its environment. This thesis develops models for dynamical systems which represent state as a set of statistics about the short-term future, as opposed to treating state as a latent, unobservable quantity. In other words, the agent summarizes the past into predictions about the short-term future, which allow the agent to make further predictions about the infinite future. Because all parameters in the model are defined using only observable quantities, the learning algorithms for such models are often straightforward and have attractive theoretical properties. We examine in depth the case where state is represented as the parameters of an exponential family distribution over a short-term window of future observations. We unify a number of different existing models under this umbrella, and predict and analyze new models derived from the generalization. One goal of this research is to push models with predictively defined state towards real-world applications. We contribute models and companion learning algorithms for domains with partial observability, continuous observations, structured observations, high-dimensional observations, and/or continuous actions. Our models successfully capture standard POMDPs and benchmark nonlinear timeseries problems with performance comparable to state-of-the-art models. They also allow us to perform well on novel domains which are larger than those captured by other models with predictively defined state, including traffic prediction problems and domains analogous to autonomous mobile robots with camera sensors.

[1]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[2]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[3]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[4]  A. Rényi On Measures of Entropy and Information , 1961 .

[5]  D. Blackwell Discrete Dynamic Programming , 1962 .

[6]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[7]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  L. Glass,et al.  Oscillation and chaos in physiological control systems. , 1977, Science.

[10]  Gene H. Golub,et al.  Matrix computations , 1983 .

[11]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[12]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13]  Daniel E. Lane,et al.  A Partially Observable Model of Decision Making by Fishermen , 1989, Oper. Res..

[14]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[15]  E. Jaynes,et al.  NOTES ON PRESENT STATUS AND FUTURE PROSPECTS , 1991 .

[16]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[17]  Andreas Stolcke,et al.  Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[18]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[19]  D. Haussler,et al.  Protein modeling using hidden Markov models: analysis of globins , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[20]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[21]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[22]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[23]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[24]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[25]  Stuart J. Russell,et al.  Adaptive Probabilistic Networks , 1994 .

[26]  M. Littman The Witness Algorithm: Solving Partially Observable Markov Decision Processes , 1994 .

[27]  Jagat Narain Kapur,et al.  Measures of information and their applications , 1994 .

[28]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[29]  Leslie Pack Kaelbling,et al.  Learning Dynamics: System Identification for Perceptually Challenged Agents , 1995, Artif. Intell..

[30]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[31]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[32]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  John Loch,et al.  Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[34]  Xavier Boyen,et al.  Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[35]  Eric A. Hansen,et al.  Solving POMDPs by Searching in Policy Space , 1998, UAI.

[36]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[37]  J. Crutchfield,et al.  Computational Mechanics: Pattern and Prediction, Structure and Simplicity , 1999, ArXiv.

[38]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[39]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[40]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[41]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[42]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[43]  Rudolph van der Merwe,et al.  The square-root unscented Kalman filter for state and parameter-estimation , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[44]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[45]  Deniz Erdoğmuş,et al.  Blind source separation using Renyi's mutual information , 2001, IEEE Signal Processing Letters.

[46]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[47]  Douglas Aberdeen,et al.  Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.

[48]  L. P. Kaelbling,et al.  Learning Geometrically-Constrained Hidden Markov Models for Robot Navigation: Bridging the Topological-Geometrical Gap , 2011, J. Artif. Intell. Res..

[49]  Daniel Nikovski,et al.  State-aggregation algorithms for learning probabilistic models for robot control , 2002 .

[50]  Hugh F. Durrant-Whyte,et al.  Simultaneous Mapping and Localization with Sparse Extended Information Filters: Theory and Initial Results , 2004, WAFR.

[51]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[52]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[53]  H. Jaeger Discrete-time, discrete-valued observable operator models: a tutorial , 2003 .

[54]  Ali H. Sayed,et al.  Fundamentals Of Adaptive Filtering , 2003 .

[55]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[56]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[57]  Shie Mannor,et al.  Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.

[58]  Yun He,et al.  A generalized divergence measure for robust image registration , 2003, IEEE Trans. Signal Process..

[59]  A. Cassandra A Survey of POMDP Applications , 2003 .

[60]  M. Lesperance,et al.  PIECEWISE REGRESSION: A TOOL FOR IDENTIFYING ECOLOGICAL THRESHOLDS , 2003 .

[61]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[62]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[63]  Peter Stone,et al.  Learning Predictive State Representations , 2003, ICML.

[64]  Wolfram Burgard,et al.  Autonomous exploration and mapping of abandoned mines , 2004, IEEE Robotics & Automation Magazine.

[65]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[66]  Michael R. James,et al.  Learning and discovery of predictive state representations in dynamical systems with reset , 2004, ICML.

[67]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[68]  Pieter Bram Bakker,et al.  The state of mind : reinforcement learning with recurrent neural networks , 2004 .

[69]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[70]  Sebastian Thrun,et al.  Learning low dimensional predictive representations , 2004, ICML.

[71]  Richard S. Sutton,et al.  Temporal-Difference Networks , 2004, NIPS.

[72]  Marco Wiering,et al.  Utile distinction hidden Markov models , 2004, ICML.

[73]  John Platt,et al.  FastMap, MetricMap, and Landmark MDS are all Nystrom Algorithms , 2005, AISTATS.

[74]  Michael R. James,et al.  Planning in Models that Combine Memory with Predictive Representations of State , 2005, AAAI.

[75]  Udo Frese A Proof for the Approximate Sparsity of SLAM Information Matrices , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[76]  Matthew R. Rudary,et al.  Predictive Linear-Gaussian Models of Stochastic Dynamical Systems , 2005, UAI.

[77]  Satinder Singh Baveja,et al.  Using predictions for planning and modeling in stochastic environments , 2005 .

[78]  Richard S. Sutton,et al.  Temporal-Difference Networks with History , 2005, IJCAI.

[79]  Richard S. Sutton,et al.  TD(λ) networks: temporal-difference networks with eligibility traces , 2005, ICML.

[80]  Eric Wiewiora,et al.  Learning predictive representations from a history , 2005, ICML.

[81]  Michael R. James,et al.  Combining Memory and Landmarks with Predictive State Representations , 2005, IJCAI.

[82]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[83]  Richard S. Sutton,et al.  Using Predictive Representations to Improve Generalization in Reinforcement Learning , 2005, IJCAI.

[84]  Michael R. James,et al.  Learning predictive state representations in dynamical systems without reset , 2005, ICML.

[85]  Kevin D. Seppi,et al.  Prioritization Methods for Accelerating MDP Solvers , 2005, J. Mach. Learn. Res..

[86]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[87]  Michael H. Bowling,et al.  Online Discovery and Learning of Predictive State Representations , 2005, NIPS.

[88]  A. Ben Hamza,et al.  Nonextensive information-theoretic measure for image edge detection , 2006, J. Electronic Imaging.

[89]  Satinder P. Singh,et al.  Mixtures of Predictive Linear Gaussian Models for Nonlinear, Stochastic Dynamical Systems , 2006, AAAI.

[90]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[91]  Satinder P. Singh,et al.  Predictive state representations with options , 2006, ICML.

[92]  Michael H. Bowling,et al.  Learning predictive state representations using non-blind policies , 2006, ICML '06.

[93]  Martin J. Wainwright,et al.  Log-determinant relaxation for approximate inference in discrete Markov random fields , 2006, IEEE Transactions on Signal Processing.

[94]  Satinder P. Singh,et al.  Kernel Predictive Linear Gaussian models for nonlinear stochastic dynamical systems , 2006, ICML.

[95]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[96]  A. Willsky,et al.  Maximum Entropy Relaxation for Graphical Model Selection Given Inconsistent Statistics , 2007, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing.

[97]  Satinder P. Singh,et al.  On discovery and learning of models with predictive representations of state for agents with continuous actions and observations , 2007, AAMAS '07.

[98]  Jesse Hoey,et al.  Assisting persons with dementia during handwashing using a partially observable Markov decision process. , 2007, ICVS 2007.

[99]  Satinder P. Singh,et al.  Efficiently learning linear-linear exponential family predictive representations of state , 2008, ICML '08.

[100]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..