An Algorithm for Pattern Discovery in Time Series

Author(s): Shalizi, Cosma Rohilla; Shalizi, Kristina Lisa; Crutchfield, James P | Abstract: We present a new algorithm for discovering patterns in time series and other sequential data. We exhibit a reliable procedure for building the minimal set of hidden, Markovian states that is statistically capable of producing the behavior exhibited in the data -- the underlying process's causal states. Unlike conventional methods for fitting hidden Markov models (HMMs) to data, our algorithm makes no assumptions about the process's causal architecture (the number of hidden states and their transition structure), but rather infers it from the data. It starts with assumptions of minimal structure and introduces complexity only when the data demand it. Moreover, the causal states it infers have important predictive optimality properties that conventional HMM states lack. We introduce the algorithm, review the theory behind it, prove its asymptotic reliability, use large deviation theory to estimate its rate of convergence, and compare it to other algorithms which also construct HMMs from data. We also illustrate its behavior on an example process, and report selected numerical results from an implementation.

[1]  Young,et al.  Inferring statistical complexity. , 1989, Physical review letters.

[2]  James Edwin Hanson,et al.  Computational Mechanics of Cellular Automata , 1993 .

[3]  Dominique Perrin,et al.  Finite Automata , 1958, Philosophy.

[4]  James P. Crutchfield,et al.  Computational mechanics of cellular automata: an example , 1997 .

[5]  J. Crutchfield,et al.  Discovering planar disorder in close-packed structures from x-ray diffraction: Beyond the fault model , 2002, cond-mat/0203290.

[6]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[7]  Dennis E. Slice,et al.  Bioinformatics: The Machine Learning Approach. Adaptive Computation and Machine Learning.Pierre Baldi , Soren Brunak , 1998 .

[8]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems , 1997 .

[9]  G. Grimmett,et al.  Probability and random processes , 2002 .

[10]  J. Crutchfield,et al.  Fluctuation Spectroscopy , 1993 .

[11]  Christopher W. Fairall,et al.  Complexity in the atmosphere , 2000, IEEE Trans. Geosci. Remote. Sens..

[12]  James P. Crutchfield,et al.  Computational Mechanics: Pattern and Prediction, Structure and Simplicity , 1999, ArXiv.

[13]  K. Marton,et al.  Entropy and the Consistent Estimation of Joint Distributions , 1993, Proceedings. IEEE International Symposium on Information Theory.

[14]  D. Blackwell,et al.  On the Identifiability Problem for Functions of Finite Markov Chains , 1957 .

[15]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[16]  Mervyn P. Freeman,et al.  The application of computational mechanics to the analysis of geomagnetic data , 2001 .

[17]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[18]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[19]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[20]  R. Badii,et al.  Complexity: Hierarchical Structures and Scaling in Physics , 1997 .

[21]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[22]  JORMA RISSANEN,et al.  A universal data compression system , 1983, IEEE Trans. Inf. Theory.

[23]  James P. Crutchfield,et al.  Discovering Noncritical Organization: Statistical Mechanical, Information Theoretic, and Computational Views of Patterns in One-Dimensional Spin Systems , 1998, Entropy.

[24]  DeLiang Wang,et al.  Unsupervised Learning: Foundations of Neural Computation , 2001, AI Mag..

[25]  J. Crutchfield,et al.  Regularities unseen, randomness observed: levels of entropy convergence. , 2001, Chaos.

[26]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[27]  John Odenckantz,et al.  Nonparametric Statistics for Stochastic Processes: Estimation and Prediction , 2000, Technometrics.

[28]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[29]  A. J. PALMER,et al.  Inference versus Imprint in Climate Modeling , 2002, Adv. Complex Syst..

[30]  J. Crutchfield The calculi of emergence: computation, dynamics and induction , 1994 .

[31]  B. Weiss Subshifts of finite type and sofic systems , 1973 .

[32]  P. Bühlmann,et al.  Variable Length Markov Chains: Methodology, Computing, and Software , 2004 .

[33]  Douglas Lind,et al.  An Introduction to Symbolic Dynamics and Coding , 1995 .

[34]  Richard W Clarke,et al.  Application of computational mechanics to the analysis of natural data: an example in geomagnetism. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  Daniel Ray Upper,et al.  Theory and algorithms for hidden Markov models and generalized hidden Markov models , 1998 .

[36]  Peter Tiño,et al.  Predicting the Future of Discrete Sequences from Fractal Representations of the Past , 2001, Machine Learning.

[37]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[38]  P M Binder,et al.  Finite statistical complexity for sofic systems. , 1999, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[39]  H. Bussemaker,et al.  Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[40]  James P. Crutchfield,et al.  Information Bottlenecks, Causal States, and Statistical Relevance Bases: How to Represent Relevant Information in memoryless transduction , 2000, Adv. Complex Syst..

[41]  Terry L King Smooth Tests of Goodness of Fit , 1991 .

[42]  C. Shalizi,et al.  Causal architecture, complexity and self-organization in time series and cellular automata , 2001 .

[43]  James P. Crutchfield,et al.  Computation at the Onset of Chaos , 1991 .

[44]  K. Murphy Passively Learning Finite Automata , 1996 .

[45]  Kevin T. Kelly The Logic of Reliable Inquiry , 1996 .

[46]  Andreas Stolcke,et al.  Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[47]  Frans M. J. Willems,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[48]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .