Discrete Sequence Prediction and Its Applications

Learning from experience to predict sequences of discrete symbols is a fundamental problem in machine learning with many applications. We present a simple and practical algorithm (TDAG) for discrete sequence prediction. Based on a text-compression method, the TDAG algorithm limits the growth of storage by retaining the most likely prediction contexts and discarding (forgetting) less likely ones. The storage/speed tradeoffs are parameterized so that the algorithm can be used in a variety of applications. Our experiments verify its performance on data compression tasks and show how it applies to two problems: dynamically optimizing Prolog programs for good average-case behavior and maintaining a cache for a database on mass storage.

[1]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[2]  Naoki Abe,et al.  On the computational complexity of approximating distributions by probabilistic automata , 1990, Machine Learning.

[3]  Philip D. Laird,et al.  Efficient unsupervised learning , 1988, COLT '88.

[4]  Stanley B. Zdonik,et al.  Fido: A Cache That Learns to Fetch , 1991, VLDB.

[5]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[6]  P.J. Denning,et al.  On learning how to predict , 1980, Proceedings of the IEEE.

[7]  Daniel S. Hirschberg,et al.  Streamlining context models for data compression , 1991, [1991] Proceedings. Data Compression Conference.

[8]  Peter Norvig,et al.  Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp , 1991 .

[9]  Philip D. Laird,et al.  Extending EBG to Term-Rewriting Systems , 1990, AAAI.

[10]  Daniel S. Hirschberg,et al.  Data compression , 1987, CSUR.

[11]  Manfred K. Warmuth,et al.  On the Computational Complexity of Approximating Distributions by Probabilistic Automata , 1990, COLT '90.

[12]  Michel Martinez Program behavior prediction and prepaging , 2004, Acta Informatica.

[13]  Pekka Orponen,et al.  Probably Approximately Optimal Derivation Strategies , 1991, KR.

[14]  Ronald Saul,et al.  Predictive Caching Using the TDAG Algorithm , 1992 .

[15]  Anselm Blumer Applications of DAWGs to data compression , 1990 .

[16]  Edwin J. Lau Improving page prefetching with prior knowledge , 1982, Perform. Evaluation.

[17]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[18]  Benjamin W. Wah,et al.  Efficient Reordering of Prolog Programs , 1989, IEEE Trans. Knowl. Data Eng..

[19]  Gerald DeJong,et al.  An Analysis of Learning to Plan as a Search Problem , 1992, ML.

[20]  Ross N. Williams,et al.  Dynamic-history predictive compression , 1988, Inf. Syst..

[21]  Benjamin W. Wah,et al.  Efficient reordering of Prolog programs , 1988, Proceedings. Fourth International Conference on Data Engineering.

[22]  Ronald Saul,et al.  Discrete sequence prediction and its applications , 2005, Machine Learning.

[23]  Devika Subramanian,et al.  The Utility of EBL in Recursive Domain Theories , 1990, AAAI.

[24]  Jack Mostow,et al.  PROLEARN: Towards a Prolog Interpreter that Learns , 1987, AAAI.

[25]  Carla Schlatter Ellis,et al.  Practical prefetching techniques for multiprocessor file systems , 2005, Distributed and Parallel Databases.

[26]  P. Krishnan,et al.  Optimal prefetching via data compression , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[27]  Thomas G. Dietterich,et al.  Learning to Predict Sequences , 1985 .

[28]  Alan Jay Smith,et al.  Sequentiality and prefetching in database systems , 1978, TODS.