A trace-driven analysis of the UNIX 4.2 BSD file system

We analyzed the UNIX 4.2 BSD file system by recording user-level activity in trace files and writing programs to analyze the traces. The tracer did not record individual read and write operations, yet still provided tight bounds on what information was accessed and when. The trace analysis shows that the average file system bandwidth needed per user is low (a few hundred bytes per second). Most of the files accessed are open only a short t ime and are accessed sequentially. Most new information is deleted or overwri t ten within a few minutes of its creation. We also wrote a simulator that uses the traces to predict the performance of caches for disk blocks. The moderate-sized caches used in UNIX reduce disk traffic for file blocks by about 50%, but larger caches (several megabytes) can eliminate 90% or more of all disk traffic. With those large caches, large block sizes (16 kbytes or more) result in the fewest disk accesses. 1. I n t r o d u c t i o n This paper describes a series of measurements made on the UNIX 4.2 BSD file system [5,8]. Most of the work was done in a series of term projects for a graduate course in operat ing systems at the University of California at Berkeley. Our goal was to collect information that would be useful in designing a shared file system for a network of personal workstations. We were interested in such questions as: • How much network bandwidth is needed to support a diskless workstation? • What are typical file access patterns (and what protocols will support those patterns best)? • How should disk block caches be organized and managed? Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. © 1985 ACM-0-89791-174-1-12 /85-0015 $00.75 • How much of a performance advantage do such caches provide? We were unable to find answers to these questions in the literature, so we decided to instrument the 4.2 BSD system to collect information about file accesses. In order to reduce the size of the trace files and the impact of the tracer on its host systems, we did not record individual read and write requests. The information that we did collect allowed us to deduce the exact ranges of bytes accessed, although the access times were less precise than they would have been if we had logged reads and writes. Section 3 of this paper discusses the tracing technique and Section 4 describes the three systems we traced. We wrote two programs to process the trace files: a reference pattern analyzer and a block cache simulator. Table I summarizes the most important results. Section 5 discusses the reference pattern analysis. Some of the conclusions are: individual users make only occasional (though bursty) use of the file system, and they need very little bandwidth on average (only a few hundred bytes per second per active user); files are usually open only a short time, and they tend to be read or writ ten sequentially in their entirety; non-sequential access is rare; most of the files that are accessed are short; and most new files have On average, about 300-600 bytes/second of file data are read or writ ten by each active user. About 70% of all file accesses are whole-file transfers, and about 50% of all bytes are transferred in wholefile transfers. 75% of all files are open less than .5 second, and 90% are open less than 10 seconds. About 20-30% of all newly-written information is deleted within 30 seconds, and about 50% is deleted within 5 minutes. A 4-Mbyte cache of disk blocks eliminates between 65% and 90% of all disk accesses for file data (depending on the write policy). For a 400-kbyte disk cache, a block size of 8 kbytes results in the fewest number of disk accesses for file data. For a 4-Mbyte cache, a 16-kbyte block size is optimal. Tab l e I. Selected results.