Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying

Modern applications use storage systems in complex and often surprising ways. Tracing system calls is a common approach to understanding applications' behavior, allowing offline analysis and enabling replay in other environments. But current system-call tracing tools have drawbacks: (1) they often omit some information---such as raw data buffers---needed for full analysis; (2) they have high overheads; (3) they often use non-portable trace formats; and (4) they may not offer useful and scalable analysis and replay tools. We have developed Re-Animator, a powerful system-call tracing tool that focuses on storage-related calls and collects maximal information, capturing complete data buffers and writing all traces in the standard DataSeries format. We also created a prototype replayer that focuses on calls related to file-system state. We evaluated our system on long-running server applications such as key-value stores and databases. Our tracer has an average overhead of only 1.8-2.3×, but the overhead can be as low as 5% for I/O-bound applications. Our replayer verifies that its actions are correct, and faithfully reproduces the logical file system state generated by the original application.

[1]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[2]  Erez Zadok,et al.  Dmdedup : Device Mapper Target for Data Deduplication , 2014 .

[3]  André Brinkmann,et al.  Challenges and Solutions for Tracing Storage Systems , 2018, ACM Trans. Storage.

[4]  Nikolai Joukov,et al.  Accurate and efficient replaying of file system traces , 2005, FAST'05.

[5]  Cristina L. Abad,et al.  Benchmarking Key-Value Stores via Trace Replay , 2017, 2017 IEEE International Conference on Cloud Engineering (IC2E).

[6]  Robert O'Callahan,et al.  Engineering Record and Replay for Deployability , 2017, USENIX Annual Technical Conference.

[7]  Erez Zadok,et al.  Energy and performance evaluation of lossless file data compression on server systems , 2009, SYSTOR '09.

[8]  Alexander S. Szalay,et al.  Just-in-Time Analytics on Large File Systems , 2011, IEEE Transactions on Computers.

[9]  Nikolai Joukov,et al.  A nine year study of file system and storage benchmarking , 2008, TOS.

[10]  Bryan Cantrill,et al.  Dynamic Instrumentation of Production Systems , 2004, USENIX Annual Technical Conference, General Track.

[11]  Erez Zadok,et al.  Cluster and Single-Node Analysis of Long-Term Deduplication Patterns , 2018, ACM Trans. Storage.

[12]  Erez Zadok,et al.  Tracefs: A File System to Trace Them All , 2004, FAST.

[13]  Karsten Schwan,et al.  High performance and scalable I/O virtualization via self-virtualized devices , 2007, HPDC '07.

[14]  Michel Dagenais,et al.  Lockless multi-core high-throughput buffering scheme for kernel tracing , 2012, OPSR.

[15]  Andrea C. Arpaci-Dusseau,et al.  Towards realistic file-system benchmarks with CodeMRI , 2008, PERV.

[16]  Nikolai Joukov,et al.  Operating system profiling via latency analysis , 2006, OSDI '06.

[17]  David Hung-Chang Du,et al.  hfplayer: Scalable Replay for Intensive Block I/O Workloads , 2017, ACM Trans. Storage.

[18]  John D. Strunk,et al.  Chronicle: Capture and Analysis of NFS Workloads at Line Rate , 2015, FAST.

[19]  Xiaodong Zhang,et al.  Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[20]  Jialin Li,et al.  Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency , 2014, SoCC.

[21]  Xiao Qin,et al.  A Pipelining Approach to Informed Prefetching in Distributed Multi-level Storage Systems , 2012, 2012 IEEE 11th International Symposium on Network Computing and Applications.

[22]  Eric Anderson,et al.  Proceedings of the Third Usenix Conference on File and Storage Technologies Buttress: a Toolkit for Flexible and High Fidelity I/o Benchmarking , 2022 .

[23]  Francisco Vilar Brasileiro,et al.  On the Accuracy of Trace Replay Methods for File System Evaluation , 2013, 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems.

[24]  Erez Zadok,et al.  Cosy: Develop in User-Land, Run in Kernel-Mode , 2003, HotOS.

[25]  Jie Yao,et al.  ROS , 2018, ACM Transactions on Storage.

[26]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[27]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[28]  Takashi Watanabe,et al.  DBLK: Deduplication for primary block storage , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[29]  Andrew R. Cherenson,et al.  The Sprite network operating system , 1988, Computer.

[30]  Andrea C. Arpaci-Dusseau,et al.  Reducing File System Tail Latencies with Chopper , 2015, FAST.

[31]  Eric Anderson,et al.  DataSeries: an efficient, flexible data format for structured serial data , 2009, OPSR.

[32]  Tejinder Pal Singh,et al.  Towards the Framework of the File Systems Performance Evaluation Techniques and the Taxonomy of Replay Traces , 2013, ArXiv.

[33]  D. Zats,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, CCRV.

[34]  Marianne Winslett,et al.  A multi-level approach for understanding I/O activity in HPC applications , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[35]  Xiao Zhang,et al.  The Study of Data Collecting Based on Kprobe , 2011, 2011 Fourth International Symposium on Computational Intelligence and Design.

[36]  Limin Xiao,et al.  MBFS: a parallel metadata search method based on Bloomfilters using MapReduce for large-scale file systems , 2015, The Journal of Supercomputing.

[37]  David Hung-Chang Du,et al.  TraceRAR: An I/O Performance Evaluation Tool for Replaying, Analyzing, and Regenerating Traces , 2017, 2017 International Conference on Networking, Architecture, and Storage (NAS).

[38]  Raju Rangaswami,et al.  I/O Deduplication: Utilizing content similarity to improve I/O performance , 2010, TOS.

[39]  Karsten Schwan,et al.  Efficient end to end data exchange using configurable compression , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[40]  Youyou Lu,et al.  HiNFS , 2018, ACM Trans. Storage.

[41]  Niels Provos,et al.  Improving Host Security with System Call Policies , 2003, USENIX Security Symposium.

[42]  Erez Zadok,et al.  Extracting flexible, replayable models from large block traces , 2012, FAST.

[43]  Andrea C. Arpaci-Dusseau,et al.  ROOT: replaying multithreaded traces with resource-oriented ordering , 2013, SOSP.

[44]  Brian D. Noble,et al.  Bobtail: Avoiding Long Tails in the Cloud , 2013, NSDI.

[45]  Daniel Gruhl,et al.  IZO: Applications of Large-Window Compression to Virtual Machine Management , 2008, LISA.

[46]  Robert Ricci,et al.  Metadata Considered Harmful...to Deduplication , 2015, HotStorage.

[47]  Christopher Krügel,et al.  Effective and Efficient Malware Detection at the End Host , 2009, USENIX Security Symposium.

[48]  Timothy W. Curry,et al.  Profiling and Tracing Dynamic Library Usage Via Interposition , 1994, USENIX Summer.

[49]  Erez Zadok,et al.  DARC: dynamic analysis of root causes of latency distributions , 2008, SIGMETRICS '08.

[50]  Andrea C. Arpaci-Dusseau,et al.  Zettabyte reliability with flexible end-to-end data integrity , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[51]  Kaladhar Voruganti,et al.  ParaSwift: File I/O Trace Modeling for the Future , 2014, LISA.

[52]  Jose Renato Santos,et al.  Bridging the Gap between Software and Hardware Techniques for I/O Virtualization , 2008, USENIX Annual Technical Conference.

[53]  Muli Ben-Yehuda,et al.  Adding advanced storage controller functionality via low-overhead virtualization , 2012, FAST.

[54]  Ethan L. Miller,et al.  Efficient Storage Management for Object-based Flash Memory , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[55]  Tzi-cker Chiueh,et al.  TBBT: scalable and accurate trace replay for file server evaluation , 2005, SIGMETRICS '05.

[56]  Ethan L. Miller,et al.  Anonymization Techniques for URLs and Filenames , 2007 .

[57]  Jiwu Shu,et al.  DMStone: A Tool for Evaluating Hierarchical Storage Management Systems: DMStone: A Tool for Evaluating Hierarchical Storage Management Systems , 2012 .

[58]  Roberto Gioiosa,et al.  Analyzing System Calls in Multi-OS Hierarchical Environments , 2015, ROSS@HPDC.

[59]  David R. O'Hallaron,et al.  //TRACE: Parallel Trace Replay with Approximate Causal Events , 2007, FAST.

[60]  Robert H. B. Netzer Optimal tracing and replay for debugging shared-memory parallel programs , 1993, PADD '93.

[61]  Cristina L. Abad,et al.  Metadata Traces and Workload Models for Evaluating Big Storage Systems , 2012, 2012 IEEE Fifth International Conference on Utility and Cloud Computing.

[62]  Quan Zhang,et al.  Mlock: building delegable metadata service for the parallel file systems , 2014, Science China Information Sciences.

[63]  Andrea C. Arpaci-Dusseau,et al.  A File Is Not a File: Understanding the I/O Behavior of Apple Desktop Applications , 2012, TOCS.

[64]  Sun Zhen,et al.  Using Hints to Improve Inline Block-layer Deduplication , 2016, FAST.

[65]  Butler W. Lampson,et al.  On-line data compression in a log-structured file system , 1992, ASPLOS V.

[66]  Timothy Bisson,et al.  iDedup: latency-aware, inline data deduplication for primary storage , 2012, FAST.

[67]  ZadokErez,et al.  A nine year study of file system and storage benchmarking , 2008 .

[68]  R. Santinelli,et al.  From detailed analysis of IO pattern of the HEP applications to benchmark of new storage solutions , 2011 .

[69]  Vasily Tarasov,et al.  Revisiting the Storage Stack in Virtualized NAS Environments , 2011, WIOV.

[70]  Bruce Jacob,et al.  The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization , 2009, ISCA '09.

[71]  Gianluca Borello,et al.  System and Application Monitoring and Troubleshooting with Sysdig , 2015 .

[72]  Sangyeun Cho,et al.  The Multi-streamed Solid-State Drive , 2014, HotStorage.

[73]  David Hung-Chang Du,et al.  On the Accuracy and Scalability of Intensive I/O Workload Replay , 2017, FAST.

[74]  Fabio Ceravolo,et al.  The Ultimate Star Wars and Philosophy. You Must Unlearn What You Have Learned , 2016 .

[75]  Andrea C. Arpaci-Dusseau,et al.  Generating realistic impressions for file-system benchmarking , 2009, TOS.