Multilevel caching in distributed file systems

This thesis explores ways in which intermediate cache servers affect the performance and scalability of distributed file systems. Caching plays an important role in improving the performance of distributed file systems. Using distributed file system traces and cache simulations, we explore extensions and modifications to the traditional client caching model employed in file systems such as AFS and Sprite. We find that an intermediate cache server--a machine logically interposed between clients and servers that provides cache service to the clients--may typically reduce read requests to primary file servers by 40% to 60%. This decreases file server load, thus increasing system scalability. We are considering a mass storage based file system, where the primary file system resides on a bank of tapes. An intermediate cache server is an essential component in this tape-based mass storage file system, necessary for overcoming the access latency of the mass storage system. Employing a delayed-write caching policy at the intermediate cache server in this system allows the intermediate server to satisfy typically 70% to 80% of file system requests that are not satisfied by client caches. This fact, combined with high client cache hit rates (typically 80%) lend to the practicality of our system. We also investigate modifications to the AFS cache model that improve file system performance over increasingly prevalent low-speed networks. We find that an intermediate cache server on the client side of the low-speed link can increase client performance by mediating low-speed link traffic. We find that an intermediate cache server employing a delayed-write caching policy can typically delay 50% to 74% of write requests until a daily sync period. Performing these writes during inactive periods improves performance for interactive traffic over the low-speed link as well as for other low-speed link traffic. These three topics are related to one another by virtue of the aspects of the architecture that they share. We use four file system traces to perform detailed trace-driven simulations to evaluate intermediate cache servers in these three roles. The same file system traces and simulations can be used to provide results applicable to each.