论文信息 - Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying

Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying

Modern applications use storage systems in complex and often surprising ways. Tracing system calls is a common approach to understanding applications' behavior, allowing offline analysis and enabling replay in other environments. But current system-call tracing tools have drawbacks: (1) they often omit some information---such as raw data buffers---needed for full analysis; (2) they have high overheads; (3) they often use non-portable trace formats; and (4) they may not offer useful and scalable analysis and replay tools. We have developed Re-Animator, a powerful system-call tracing tool that focuses on storage-related calls and collects maximal information, capturing complete data buffers and writing all traces in the standard DataSeries format. We also created a prototype replayer that focuses on calls related to file-system state. We evaluated our system on long-running server applications such as key-value stores and databases. Our tracer has an average overhead of only 1.8-2.3×, but the overhead can be as low as 5% for I/O-bound applications. Our replayer verifies that its actions are correct, and faithfully reproduces the logical file system state generated by the original application.

[1] Stephanie Forrest,et al. A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[2] Erez Zadok,et al. Dmdedup : Device Mapper Target for Data Deduplication , 2014 .

[3] André Brinkmann,et al. Challenges and Solutions for Tracing Storage Systems , 2018, ACM Trans. Storage.

[4] Nikolai Joukov,et al. Accurate and efficient replaying of file system traces , 2005, FAST'05.

[5] Cristina L. Abad,et al. Benchmarking Key-Value Stores via Trace Replay , 2017, 2017 IEEE International Conference on Cloud Engineering (IC2E).

[6] Robert O'Callahan,et al. Engineering Record and Replay for Deployability , 2017, USENIX Annual Technical Conference.

[7] Erez Zadok,et al. Energy and performance evaluation of lossless file data compression on server systems , 2009, SYSTOR '09.

[8] Alexander S. Szalay,et al. Just-in-Time Analytics on Large File Systems , 2011, IEEE Transactions on Computers.

[9] Nikolai Joukov,et al. A nine year study of file system and storage benchmarking , 2008, TOS.

[10] Bryan Cantrill,et al. Dynamic Instrumentation of Production Systems , 2004, USENIX Annual Technical Conference, General Track.

[11] Erez Zadok,et al. Cluster and Single-Node Analysis of Long-Term Deduplication Patterns , 2018, ACM Trans. Storage.

[12] Erez Zadok,et al. Tracefs: A File System to Trace Them All , 2004, FAST.

[13] Karsten Schwan,et al. High performance and scalable I/O virtualization via self-virtualized devices , 2007, HPDC '07.

[14] Michel Dagenais,et al. Lockless multi-core high-throughput buffering scheme for kernel tracing , 2012, OPSR.

[15] Andrea C. Arpaci-Dusseau,et al. Towards realistic file-system benchmarks with CodeMRI , 2008, PERV.

[16] Nikolai Joukov,et al. Operating system profiling via latency analysis , 2006, OSDI '06.

[17] David Hung-Chang Du,et al. hfplayer: Scalable Replay for Intensive Block I/O Workloads , 2017, ACM Trans. Storage.

[18] John D. Strunk,et al. Chronicle: Capture and Analysis of NFS Workloads at Line Rate , 2015, FAST.

[19] Xiaodong Zhang,et al. Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[20] Jialin Li,et al. Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency , 2014, SoCC.

[21] Xiao Qin,et al. A Pipelining Approach to Informed Prefetching in Distributed Multi-level Storage Systems , 2012, 2012 IEEE 11th International Symposium on Network Computing and Applications.

[22] Eric Anderson,et al. Proceedings of the Third Usenix Conference on File and Storage Technologies Buttress: a Toolkit for Flexible and High Fidelity I/o Benchmarking , 2022 .

[23] Francisco Vilar Brasileiro,et al. On the Accuracy of Trace Replay Methods for File System Evaluation , 2013, 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems.

[24] Erez Zadok,et al. Cosy: Develop in User-Land, Run in Kernel-Mode , 2003, HotOS.

[25] Jie Yao,et al. ROS , 2018, ACM Transactions on Storage.

[26] Luiz André Barroso,et al. The tail at scale , 2013, CACM.

[27] Stephanie Forrest,et al. Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[28] Takashi Watanabe,et al. DBLK: Deduplication for primary block storage , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[29] Andrew R. Cherenson,et al. The Sprite network operating system , 1988, Computer.

[30] Andrea C. Arpaci-Dusseau,et al. Reducing File System Tail Latencies with Chopper , 2015, FAST.

[31] Eric Anderson,et al. DataSeries: an efficient, flexible data format for structured serial data , 2009, OPSR.

[32] Tejinder Pal Singh,et al. Towards the Framework of the File Systems Performance Evaluation Techniques and the Taxonomy of Replay Traces , 2013, ArXiv.

[33] D. Zats,et al. DeTail: reducing the flow completion time tail in datacenter networks , 2012, CCRV.

[34] Marianne Winslett,et al. A multi-level approach for understanding I/O activity in HPC applications , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[35] Xiao Zhang,et al. The Study of Data Collecting Based on Kprobe , 2011, 2011 Fourth International Symposium on Computational Intelligence and Design.

[36] Limin Xiao,et al. MBFS: a parallel metadata search method based on Bloomfilters using MapReduce for large-scale file systems , 2015, The Journal of Supercomputing.

[37] David Hung-Chang Du,et al. TraceRAR: An I/O Performance Evaluation Tool for Replaying, Analyzing, and Regenerating Traces , 2017, 2017 International Conference on Networking, Architecture, and Storage (NAS).

[38] Raju Rangaswami,et al. I/O Deduplication: Utilizing content similarity to improve I/O performance , 2010, TOS.

[39] Karsten Schwan,et al. Efficient end to end data exchange using configurable compression , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[40] Youyou Lu,et al. HiNFS , 2018, ACM Trans. Storage.

[41] Niels Provos,et al. Improving Host Security with System Call Policies , 2003, USENIX Security Symposium.

[42] Erez Zadok,et al. Extracting flexible, replayable models from large block traces , 2012, FAST.

[43] Andrea C. Arpaci-Dusseau,et al. ROOT: replaying multithreaded traces with resource-oriented ordering , 2013, SOSP.

[44] Brian D. Noble,et al. Bobtail: Avoiding Long Tails in the Cloud , 2013, NSDI.

[45] Daniel Gruhl,et al. IZO: Applications of Large-Window Compression to Virtual Machine Management , 2008, LISA.

[46] Robert Ricci,et al. Metadata Considered Harmful...to Deduplication , 2015, HotStorage.

[47] Christopher Krügel,et al. Effective and Efficient Malware Detection at the End Host , 2009, USENIX Security Symposium.

[48] Timothy W. Curry,et al. Profiling and Tracing Dynamic Library Usage Via Interposition , 1994, USENIX Summer.

[49] Erez Zadok,et al. DARC: dynamic analysis of root causes of latency distributions , 2008, SIGMETRICS '08.

[50] Andrea C. Arpaci-Dusseau,et al. Zettabyte reliability with flexible end-to-end data integrity , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[51] Kaladhar Voruganti,et al. ParaSwift: File I/O Trace Modeling for the Future , 2014, LISA.

[52] Jose Renato Santos,et al. Bridging the Gap between Software and Hardware Techniques for I/O Virtualization , 2008, USENIX Annual Technical Conference.

[53] Muli Ben-Yehuda,et al. Adding advanced storage controller functionality via low-overhead virtualization , 2012, FAST.

[54] Ethan L. Miller,et al. Efficient Storage Management for Object-based Flash Memory , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[55] Tzi-cker Chiueh,et al. TBBT: scalable and accurate trace replay for file server evaluation , 2005, SIGMETRICS '05.

[56] Ethan L. Miller,et al. Anonymization Techniques for URLs and Filenames , 2007 .

[57] Jiwu Shu,et al. DMStone: A Tool for Evaluating Hierarchical Storage Management Systems: DMStone: A Tool for Evaluating Hierarchical Storage Management Systems , 2012 .

[58] Roberto Gioiosa,et al. Analyzing System Calls in Multi-OS Hierarchical Environments , 2015, ROSS@HPDC.

[59] David R. O'Hallaron,et al. //TRACE: Parallel Trace Replay with Approximate Causal Events , 2007, FAST.

[60] Robert H. B. Netzer. Optimal tracing and replay for debugging shared-memory parallel programs , 1993, PADD '93.

[61] Cristina L. Abad,et al. Metadata Traces and Workload Models for Evaluating Big Storage Systems , 2012, 2012 IEEE Fifth International Conference on Utility and Cloud Computing.

[62] Quan Zhang,et al. Mlock: building delegable metadata service for the parallel file systems , 2014, Science China Information Sciences.

[63] Andrea C. Arpaci-Dusseau,et al. A File Is Not a File: Understanding the I/O Behavior of Apple Desktop Applications , 2012, TOCS.

[64] Sun Zhen,et al. Using Hints to Improve Inline Block-layer Deduplication , 2016, FAST.

[65] Butler W. Lampson,et al. On-line data compression in a log-structured file system , 1992, ASPLOS V.

[66] Timothy Bisson,et al. iDedup: latency-aware, inline data deduplication for primary storage , 2012, FAST.