Long-term file system characterization

Despite evidence that long-term file system behavior differs from short-term behavior, researchers continue to use short-term benchmarks to evaluate layout policies because long-term data is difficult to collect and manage. In this thesis, we collect long-term traces and develop a set of metrics that illustrate workload features most relevant to data layout. We find that both read and write traffic are well-represented in cache-miss traffic, but individual files tend to be accessed read-mostly or write-mostly. Existing layout policies optimize for either reads or writes without considering files' access history. We develop a hybrid policy that uses read-optimized layout policy for read-mostly files and a write-optimized policy for write-mostly files. Our evaluation shows that our policy performs better for reads than read-optimizing polices and nearly as well for writes as write-optimizing policies.