Optimal prefetching via data compression

Caching and prefetching are important mechanisms for speeding up access time to data on secondary storage. Recent work in competitive online algorithms has uncovered several promising new algorithms for caching. In this paper, we apply a form of the competitive philosophy for the first time to the problem of prefetching to develop an optimal universal prefetcher in terms of fault rate, with particular applications to large-scale databases and hypertext systems. Our prediction algorithms with particular applications to large-scale databases and hypertext systems. Our prediction algorithms for prefetching are novel in that they are based on data compression techniques that are both theoretically optimal and good in practice. Intuitively, in order to compress data effectively, you have to be able to predict future data well, and thus good data compressors should be able to predict well for purposes of prefetching. We show for powerful models such as Markov sources and mthe order Markov sources that the page fault rate incurred by our prefetching algorithms are optimal in the limit for almost all sequences of page requests.

[1]  Glen G. Langdon,et al.  An Introduction to Arithmetic Coding , 1984, IBM J. Res. Dev..

[2]  P. Krishnan,et al.  Optimal prediction for prefetching in the worst case , 1994, SODA '94.

[3]  Neri Merhav,et al.  Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[4]  Sandy Irani,et al.  Strongly competitive algorithms for paging with locality of reference , 1992, SODA '92.

[5]  Leonard Pitt,et al.  On the necessity of Occam algorithms , 1990, STOC '90.

[6]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[7]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[8]  Anne Rogers,et al.  Software support for speculative loads , 1992, ASPLOS V.

[9]  M. Luby,et al.  On ~ competitive algorithms for paging problems , 1991 .

[10]  Yali Amit,et al.  Large deviations for coding Markov chains and Gibbs random fields , 1993, IEEE Trans. Inf. Theory.

[11]  Samuel Karlin,et al.  A First Course on Stochastic Processes , 1968 .

[12]  Thomas M. Cover,et al.  Compound Bayes Predictors for Sequences with Apparent Markov Structure , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  R. Gallager Information Theory and Reliable Communication , 1968 .

[14]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[15]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[16]  Samuel Karlin,et al.  ELEMENTS OF STOCHASTIC PROCESSES , 1975 .

[17]  Amos Fiat,et al.  Competitive Paging Algorithms , 1991, J. Algorithms.

[18]  James T. Brady,et al.  A Theory of Productivity in the Creative Process , 1986, IEEE Computer Graphics and Applications.

[19]  G. Lugosi,et al.  On Prediction of Individual Sequences , 1998 .

[20]  Jeffrey Scott Vitter,et al.  Analysis of arithmetic coding for data compression , 1991, [1991] Proceedings. Data Compression Conference.

[21]  Stanley B. Zdonik,et al.  Fido: A Cache That Learns to Fetch , 1991, VLDB.

[22]  Robert E. Tarjan,et al.  Amortized efficiency of list update and paging rules , 1985, CACM.

[23]  Mahadev Satyanarayanan,et al.  A status report on research in transparent informed prefetching , 1993, OPSR.

[24]  P. Krishnan,et al.  Practical prefetching via data compression , 1993 .

[25]  D. A. Bell,et al.  Information Theory and Reliable Communication , 1969 .

[26]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[27]  Jeffrey Scott Vitter,et al.  Analysis of arithmetic coding for data compression , 1991, [1991] Proceedings. Data Compression Conference.

[28]  Allan Borodin,et al.  Competitive paging with locality of reference , 1991, STOC '91.

[29]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[30]  Glen G. Langdon,et al.  A note on the Ziv-Lempel model for compressing individual sequences , 1983, IEEE Trans. Inf. Theory.

[31]  Jean-Loup Baer,et al.  Reducing memory latency via non-blocking and prefetching caches , 1992, ASPLOS V.

[32]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[33]  J. Kingman A FIRST COURSE IN STOCHASTIC PROCESSES , 1967 .

[34]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[35]  Anna R. Karlin,et al.  Markov paging , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[36]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.