The case for efficient file access pattern modeling

Most modern I/O systems treat each file access independently. However events in a computer system are driven by programs. Thus, accesses to files occur in consistent patterns and are by no means independent. The result is that modern I/O systems ignore useful information. Using traces of file system activity we show that file accesses are strongly correlated with preceding accesses. In fact, a simple last-successor model (one that predicts each file access will be followed by the same file that followed the last time it was accessed) successfully predicted the next file 72% of the time. We examine the ability of two previously proposed models for file access prediction in comparison to this baseline model and see a stark contrast in accuracy and high overheads in state space. We then enhance one of these models to address the issues of model space requirements. This new model is able to improve an additional 10% on the accuracy of the last-successor model, while working within a state space that is within a constant factor (relative to the number of files) of the last successor model. While this work was motivated by the use of file relationships for l/O prefetching, information regarding the likelihood of file access patterns has several other uses such as disk layout and file clustering for disconnected operation.

[1]  M. Frans Kaashoek,et al.  Embedded Inodes and Explicit Grouping: Exploiting Disk Bandwidth for Small Files , 1997, USENIX Annual Technical Conference.

[2]  Hui Lei,et al.  An analytical approach to file prefetching , 1997 .

[3]  Jeffrey C. Mogul,et al.  Using predictive prefetching to improve World Wide Web latency , 1996, CCRV.

[4]  Jeanna Neefe Matthews,et al.  Improving the performance of log-structured file systems with adaptive methods , 1997, SOSP.

[5]  Geoffrey H. Kuenning,et al.  The Design of the SEER Predictive Caching System , 1994, 1994 First Workshop on Mobile Computing Systems and Applications.

[6]  Trevor N. Mudge,et al.  Analysis of branch prediction via data compression , 1996, ASPLOS VII.

[7]  Mahadev Satyanarayanan,et al.  Coda: A Highly Available File System for a Distributed Workstation Environment , 1990, IEEE Trans. Computers.

[8]  Daniel A. Reed,et al.  Input/output access pattern classification using hidden Markov models , 1997, IOPADS '97.

[9]  James Griffioen Randy Appleton Performance Measurements of Automatic Prefetching , 1995 .

[10]  Drew Roselli,et al.  Characteristics of File System Workloads , 1998 .

[11]  Thomas M. Kroeger,et al.  Predicting file system actions from prior events , 1996 .

[12]  P. Krishnan,et al.  Optimal prefetching via data compression , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[13]  Mahadev Satyanarayanan,et al.  Coda: a highly available file system for a distributed workstation environment , 1989, Proceedings of the Second Workshop on Workstation Operating Systems.