Design and Implementation of a Predictive File Prefetching Algorithm

We have previously shown that the patterns in which files are accessed offer information that can accurately predict upcoming file accesses. Most modern caches ignore these patterns, thereby failing to use information that enables significant reductions in I/O latency. While prefetching heuristics that expect sequential accesses are often effective methods to reduce I/O latency, they cannot be applied across files, because the abstraction of a file has no intrinsic concept of a successor. This limits the ability of modern file systems to prefetch. Here we presents our implementation of a predictive prefetching system, that makes use of file access patterns to reduce I/O latency. Previously we developed a technique called Partitioned Context Modeling (PCM) [13] that efficiently models file accesses to reliably predict upcoming requests. We present our experiences in implementing predictive prefetching based on file access patterns. From the lessons learned we developed of a new technique Extended Partitioned Context Modeling (EPCM), which has even better performance. We have modified the Linux kernel to prefetch file data based on Partitioned Context Modeling and Extended Partitioned Context Modeling. With this implementation we examine how a prefetching policy, that uses such models to predict upcoming accesses, can result in large reductions in I/O latencies. We tested our implementation with four different application-based benchmarks and saw I/O latency reduced by 31% to 90% and elapsed time reduced by 11% to 16%. tmk@cips.nokia.com. Supported in part by the Usenix Association and the National Science Foundation under Grant CCR-9704347. darrell@cse.ucsc.edu. Supported in part by the National Science Foundation under Grant CCR-9704347.

[1]  Magnus,et al.  Linux Kernel Internals with Cdrom , 1997 .

[2]  Jeffrey C. Mogul,et al.  Using predictive prefetching to improve World Wide Web latency , 1996, CCRV.

[3]  Brian N. Bershad,et al.  A trace-driven comparison of algorithms for parallel prefetching and caching , 1996, OSDI '96.

[4]  Hui Lei,et al.  An analytical approach to file prefetching , 1997 .

[5]  Geoffrey H. Kuenning,et al.  An Analysis of Trace Data for Predictive File Caching in Mobile Computing , 1994, USENIX Summer.

[6]  Darrell D. E. Long,et al.  Predicting Future File-System Actions From Prior Events , 1996, USENIX Annual Technical Conference.

[7]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[8]  Thomas M. Kroeger,et al.  Modeling file access patterns to improve caching performance , 2000 .

[9]  J. Howard Et El,et al.  Scale and performance in a distributed file system , 1988 .

[10]  Daniel A. Reed,et al.  Input/output access pattern classification using hidden Markov models , 1997, IOPADS '97.

[11]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[12]  Jeanna Neefe Matthews,et al.  Improving the performance of log-structured file systems with adaptive methods , 1997, SOSP.

[13]  James Griffioen Randy Appleton Performance Measurements of Automatic Prefetching , 1995 .

[14]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[15]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[16]  Garth A. Gibson,et al.  Automatic I/O hint generation through speculative execution , 1999, OSDI '99.

[17]  Anna R. Karlin,et al.  A study of integrated prefetching and caching strategies , 1995, SIGMETRICS '95/PERFORMANCE '95.

[18]  Geoffrey H. Kuenning,et al.  Automated hoarding for mobile computers , 1997, SOSP.

[19]  Robert Magnus,et al.  Linux Kernel Internals , 1996 .

[20]  Azer Bestavros,et al.  Speculative data dissemination and service to reduce server load, network traffic and service time in distributed information systems , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[21]  Richard A. Golding,et al.  Predicting File System Actions from Reference Patterns Predicting File System Actions from Reference Patterns , 1996 .

[22]  P. Krishnan,et al.  Optimal prefetching via data compression , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[23]  Trevor N. Mudge,et al.  Analysis of branch prediction via data compression , 1996, ASPLOS VII.

[24]  John K. Ousterhout,et al.  Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.

[25]  Darrell D. E. Long,et al.  The case for efficient file access pattern modeling , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.

[26]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.

[27]  Udi Manber,et al.  GLIMPSE: A Tool to Search Through Entire File Systems , 1994, USENIX Winter.