Identifying and Using Patterns in Sequential Data

Whereas basic machine learning research has mostly viewed input data as an unordered random sample from a population, researchers have also studied learning from data whose input sequence follows a regular sequence. To do so requires that we regard the input data as a stream and identify regularities in the data values as they occur. In this brief survey I review three sequential-learning problems, examine some new, and not-so-new, algorithms for learning from sequences, and give applications for these methods. The three generic problems I discuss are: Predicting sequences of discrete symbols generated by stochastic processes. Learning streams by extrapolation from a general rule. Learning to predict time series.

[1]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[2]  D. Catlin Estimation, Control, and the Discrete Kalman Filter , 1988 .

[3]  Dana Angluin,et al.  Inference of Reversible Languages , 1982, JACM.

[4]  Jeffrey C. Schlimmer,et al.  Applying machine learning to electronic form filling , 1993, Defense, Security, and Sensing.

[5]  Jeffrey D. Scargle,et al.  An introduction to chaotic and random time series analysis , 1989, Int. J. Imaging Syst. Technol..

[6]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[7]  Phillip D. Summers,et al.  A Methodology for LISP Program Construction from Examples , 1977, J. ACM.

[8]  George E. P. Box,et al.  Time Series Analysis: Forecasting and Control , 1977 .

[9]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[10]  J. Doyne Farmer,et al.  Exploiting Chaos to Predict the Future and Reduce Noise , 1989 .

[11]  Stephen F. Gull,et al.  Developments in Maximum Entropy Data Analysis , 1989 .

[12]  Hisao Tamaki,et al.  Unfold/Fold Transformation of Logic Programs , 1984, ICLP.

[13]  John Darlington,et al.  A Transformation System for Developing Recursive Programs , 1977, J. ACM.

[14]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[15]  David E. Rumelhart,et al.  BACK-PROPAGATION, WEIGHT-ELIMINATION AND TIME SERIES PREDICTION , 1991 .

[16]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[17]  William H. Press,et al.  Numerical recipes , 1990 .

[18]  Ronald Saul,et al.  A model of sequence extrapolation , 1993, COLT '93.

[19]  Charles L. Hedrick,et al.  Learning Production Systems from Examples , 1976, Artif. Intell..

[20]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[21]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[22]  M. Kendall,et al.  A Study in the Analysis of Stationary Time-Series. , 1955 .

[23]  Jan Paredis Learning the Behavior of Dynamical Systems form Examples , 1989, ML.

[24]  P. Krishnan,et al.  Optimal prefetching via data compression , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[25]  A. Tanenbaum Computer recreations , 1973 .

[26]  David E. Shaw,et al.  Inferring LISP Programs From Examples , 1975, IJCAI.

[27]  John Skilling,et al.  Maximum Entropy and Bayesian Methods , 1989 .

[28]  Martin Casdagli,et al.  Nonlinear prediction of chaotic time series , 1989 .

[29]  James P. Crutchfield,et al.  Geometry from a Time Series , 1980 .

[30]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[31]  Ross N. Williams,et al.  Adaptive Data Compression , 1990 .

[32]  D. Haussler,et al.  Stochastic context-free grammars for modeling RNA , 1993, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[33]  H. Simon,et al.  Empirical tests of a theory of human acquisition of concepts for sequential patterns , 1973 .

[34]  F. Takens Detecting strange attractors in turbulence , 1981 .

[35]  Eric A. Wan Temporal Backpropagation: An Efficient Algorithm for Finite Impulse Response Neural Networks , 1991 .

[36]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[37]  A. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[38]  Pat Langley,et al.  Rediscovering Physics with BACON.3 , 1979, IJCAI.