Generating realistic impressions for file-system benchmarking

The performance of file systems and related software depends on characteristics of the underlying file-system image (i.e., file-system metadata and file contents). Unfortunately, rather than benchmarking with realistic file-system images, most system designers and evaluators rely on ad hoc assumptions and (often inaccurate) rules of thumb. Furthermore, the lack of standardization and reproducibility makes file-system benchmarking ineffective. To remedy these problems, we develop Impressions, a framework to generate statistically accurate file-system images with realistic metadata and content. Impressions is flexible, supporting user-specified constraints on various file-system parameters using a number of statistical techniques to generate consistent images. In this article, we present the design, implementation, and evaluation of Impressions and demonstrate its utility using desktop search as a case study. We believe Impressions will prove to be useful to system developers and users alike.

[1]  Erik Riedel,et al.  A Framework for Evaluating Storage System Security , 2002, FAST.

[2]  Rich Friedrich,et al.  The Implications of Distributed Data in a Commercial Environment on the Design of Hierarchical Storage Management , 1994, Perform. Evaluation.

[3]  Kanad Ghose,et al.  yFS: A Journaling File System Design for Handling Large Data Sets with Reduced Seeking , 2003, FAST.

[4]  DruschelPeter,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001 .

[5]  H KatzRandy,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988 .

[6]  Brian D. Noble,et al.  Samsara: honor among thieves in peer-to-peer storage , 2003, SOSP '03.

[7]  David A. Patterson,et al.  A new approach to I/O performance evaluation: self-scaling I/O benchmarks, predicted I/O performance , 1994, TOCS.

[8]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[9]  Mark D. Corner,et al.  TFS: A Transparent File System for Contributory Storage , 2007, FAST.

[10]  Yilei Shao,et al.  Segank: A Distributed Mobile Storage System , 2004, FAST.

[11]  Brian D. Noble,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Pastiche: Making Backup Cheap and Easy , 2022 .

[12]  Mahadev Satyanarayanan,et al.  A study of file sizes and functional lifetimes , 1981, SOSP.

[13]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[14]  Darrell C. Anderson Fstress: A Flexible Network File Service Benchmark , 2002 .

[15]  Olivier Ridoux,et al.  A Logic File System , 2003, USENIX Annual Technical Conference, General Track.

[16]  Maria Ebling,et al.  SynRGen: an extensible file reference generator , 1994, SIGMETRICS.

[17]  Udi Manber,et al.  Integrating content-based access mechanisms with hierarchical file systems , 1999, OSDI '99.

[18]  Andrea C. Arpaci-Dusseau,et al.  IRON file systems , 2005, SOSP '05.

[19]  J. Weijer,et al.  Word length, sentence length and frequency: Zipf revisited , 2004 .

[20]  Michael Mitzenmacher,et al.  Dynamic Models for File Sizes and Double Pareto Distributions , 2004, Internet Math..

[21]  Nikolai Joukov,et al.  Auto-pilot: A Platform for System Software Benchmarking , 2005, USENIX Annual Technical Conference, FREENIX Track.

[22]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[23]  Allen B. Downey The structural cause of file size distributions , 2001, SIGMETRICS '01.

[24]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[25]  Shankar Pasupathy,et al.  Measurement and Analysis of Large-Scale Network File System Workloads , 2008, USENIX Annual Technical Conference.

[26]  Michael Dahlin,et al.  Cooperative caching: using remote client memory to improve file system performance , 1994, OSDI '94.

[27]  Norman C. Hutchinson,et al.  Logical vs. physical file system backup , 1999, OSDI '99.

[28]  David R. O'Hallaron,et al.  //TRACE: Parallel Trace Replay with Approximate Causal Events , 2007, FAST.

[29]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[30]  Ethan L. Miller,et al.  Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage , 2008, FAST.

[31]  John A. Kunze,et al.  A trace-driven analysis of the UNIX 4.2 BSD file system , 1985, SOSP '85.

[32]  William J. Bolosky,et al.  A large-scale study of file-system contents , 1999, SIGMETRICS '99.

[33]  David Mazières,et al.  Fast and secure distributed read-only file system , 2000, TOCS.

[34]  Eric A. Brewer,et al.  Self-similarity in file systems , 1998, SIGMETRICS '98/PERFORMANCE '98.

[35]  Jacob R. Lorch,et al.  A five-year study of file-system metadata , 2007, TOS.

[36]  Tzi-cker Chiueh,et al.  TBBT: scalable and accurate trace replay for file server evaluation , 2005, SIGMETRICS '05.

[37]  Bartosz Przydatek A Fast Approximation Algorithm for the Subset‐sum Problem , 2002 .

[38]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[39]  Eric Anderson,et al.  Proceedings of the Third Usenix Conference on File and Storage Technologies Buttress: a Toolkit for Flexible and High Fidelity I/o Benchmarking , 2022 .

[40]  Mary Baker,et al.  Measurements of a distributed file system , 1991, SOSP '91.

[41]  David A. Patterson,et al.  A new approach to I/O performance evaluation: self-scaling I/O benchmarks, predicted I/O performance , 1993, SIGMETRICS '93.

[42]  Andrew S. Tanenbaum,et al.  Immediate files , 1984, Softw. Pract. Exp..

[43]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.