Predicting when not to predict

File prefetching based on previous file access patterns has been shown to be an effective means of reducing file system latency by implicitly loading caches with files that are likely to be needed in the near future. Mistaken prefetching requests can be very costly in terms of added performance overheads, including increased latency and bandwidth consumption. Such costs of mispredictions are easily overlooked when considering access prediction algorithms only in terms of their accuracy; we describe a novel algorithm that uses machine learning not only to improve overall prediction accuracy, but also as a means to avoid those costly mispredictions. Our algorithm is fully adaptive to changing workloads, and is fully automated in its ability to refrain from offering predictions when they are likely to be mistaken. Our trace-based simulations show that our algorithm produces prediction accuracies of up to 98%. While this appears to be at the expense of a very slight reduction in cache hit ratios, application of this algorithm actually results in substantial reductions in unnecessary (and costly) I/O operations.

[1]  Thomas M. Kroeger,et al.  Predicting file system actions from prior events , 1996 .

[2]  Scott A. Brandt,et al.  Performing file prediction with a program-based successor model , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[3]  Darrell D. E. Long,et al.  Noah: low-cost file access prediction through pairs , 2001, Conference Proceedings of the 2001 IEEE International Performance, Computing, and Communications Conference (Cat. No.01CH37210).

[4]  Mahadev Satyanarayanan,et al.  Long Term Distributed File Reference Tracing: Implementation and Experience , 1996, Softw. Pract. Exp..

[5]  Geoffrey H. Kuenning,et al.  Simplifying automated hoarding methods , 2002, MSWiM '02.

[6]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[7]  P. Krishnan,et al.  Optimal prefetching via data compression , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[8]  Ahmed Amer,et al.  File access prediction with adjustable accuracy , 2002, Conference Proceedings of the IEEE International Performance, Computing, and Communications Conference (Cat. No.02CH37326).

[9]  Jim Griffioen,et al.  Reducing File System Latency using a Predictive Approach , 1994, USENIX Summer.

[10]  Edward Y. Chang,et al.  Design and Implementation of Semi-preemptible IO , 2003, FAST.

[11]  Scott A. Brandt,et al.  ACME: Adaptive Caching Using Multiple Experts , 2002, WDAS.

[12]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[13]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.

[14]  Mahadev Satyanarayanan,et al.  Coda: A Highly Available File System for a Distributed Workstation Environment , 1990, IEEE Trans. Computers.

[15]  John Wilkes,et al.  UNIX Disk Access Patterns , 1993, USENIX Winter.

[16]  Laszlo A. Belady,et al.  On-Line Measurement of Paging Behavior by the Multivalued MIN Algorithm , 1974, IBM J. Res. Dev..

[17]  M. Satyanarayanan,et al.  Long Term Distributed File Reference Tracing: Implementation and Experience , 1996 .

[18]  Darrell D. E. Long,et al.  The case for efficient file access pattern modeling , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.

[19]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[20]  John K. Ousterhout,et al.  Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.

[21]  Prashant J. Shenoy,et al.  Rules of thumb in data engineering , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[22]  Geoffrey H. Kuenning,et al.  An Analysis of Trace Data for Predictive File Caching in Mobile Computing , 1994, USENIX Summer.

[23]  Darrell D. E. Long,et al.  Design and Implementation of a Predictive File Prefetching Algorithm , 2001, USENIX Annual Technical Conference, General Track.

[24]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[25]  Hui Lei,et al.  An analytical approach to file prefetching , 1997 .

[26]  Randal C. Burns,et al.  Group-based management of distributed file caches , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[27]  Randal C. Burns,et al.  Using multiple predictors to improve the accuracy of file access predictions , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..