WITCHER : Detecting Crash Consistency Bugs in Non-volatile Memory Programs

The advent of non-volatile main memory (NVM) enables the development of crash-consistent software without paying storage stack overhead. However, building a correct crash-consistent program remains very challenging in the presence of a volatile cache. This paper presents WITCHER, a crash consistency bug detector for NVM software, that is (1) scalable -- does not suffer from test space explosion, (2) automatic -- does not require manual source code annotations, and (3) precise -- does not produce false positives. WITCHER first infers a set of "likely invariants" that are believed to be true to be crash consistent by analyzing source codes and NVM access traces. WITCHER automatically composes NVM images that simulate those potentially inconsistent (crashing) states violating the likely invariants. Then WITCHER performs "output equivalence checking" by comparing the output of program executions with and without a simulated crash. It validates if a likely invariant violation under test is a true crash consistency bug. Evaluation with ten persistent data structures, two real-world servers, and five example codes in Intel's PMDK library shows that WITCHER outperforms state-of-the-art tools. WITCHER discovers 37 (32 new) crash consistency bugs, which were all confirmed.

[1]  Emina Torlak,et al.  Effective interprocedural resource leak detection , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[2]  Ismail Oukid,et al.  Memory Management Techniques for Large-Scale Persistent-Main-Memory Systems , 2017, Proc. VLDB Endow..

[3]  Tudor David,et al.  Log-Free Concurrent Data Structures , 2018, USENIX Annual Technical Conference.

[4]  Sam H. Noh,et al.  Write-Optimized Dynamic Hashing for Persistent Memory , 2019, FAST.

[5]  Taesoo Kim,et al.  SplitFS: reducing software overhead in file systems for persistent memory , 2019, SOSP.

[6]  Karsten Schwan,et al.  NVRAM-aware Logging in Transaction Systems , 2014, Proc. VLDB Endow..

[7]  Dawson R. Engler,et al.  From uncertainty to belief: inferring the specification within , 2006, OSDI '06.

[8]  Hridesh Rajan,et al.  Exploiting implicit beliefs to resolve sparse usage problem in usage-based specification mining , 2017, Proc. ACM Program. Lang..

[9]  Francesco Zappa Nardelli,et al.  86-TSO : A Rigorous and Usable Programmer ’ s Model for x 86 Multiprocessors , 2010 .

[10]  Taesoo Kim,et al.  Recipe: converting concurrent DRAM indexes to persistent-memory indexes , 2019, SOSP.

[11]  Ryan Johnson,et al.  Scalable Logging through Emerging Non-Volatile Memory , 2014, Proc. VLDB Endow..

[12]  Samira Khan,et al.  Cross-Failure Bug Detection in Persistent Memory Programs , 2020, ASPLOS.

[13]  Margo I. Seltzer,et al.  Closing the Performance Gap Between Volatile and Persistent Key-Value Stores Using Cross-Referencing Logs , 2018, USENIX ATC.

[14]  Vikram S. Adve,et al.  Using likely invariants for automated software fault localization , 2013, ASPLOS '13.

[15]  Jishen Zhao,et al.  PMTest: A Fast and Flexible Testing Framework for Persistent Memory Programs , 2019, ASPLOS.

[16]  Sanjay Kumar,et al.  System software for persistent memory , 2014, EuroSys '14.

[17]  Sam H. Noh,et al.  WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems , 2017, FAST.

[18]  George Varghese,et al.  Checking Beliefs in Dynamic Networks , 2015, NSDI.

[19]  Pandian Raju,et al.  Finding Crash-Consistency Bugs with Bounded Black-Box Crash Testing , 2018, OSDI.

[20]  Andrea C. Arpaci-Dusseau,et al.  EIO: Error Handling is Occasionally Correct , 2008, FAST.

[21]  Adam Chlipala,et al.  Verifying a high-performance crash-safe file system using a tree specification , 2017, SOSP.

[22]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[23]  Youjip Won,et al.  Endurable Transient Inconsistency in Byte-Addressable Persistent B+-Tree , 2018, FAST.

[24]  Xiao Ma,et al.  MUVI: automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs , 2007, SOSP.

[25]  Qin Jin,et al.  Persistent B+-Trees in Non-Volatile Main Memory , 2015, Proc. VLDB Endow..

[26]  Hans-Juergen Boehm,et al.  Makalu: fast recoverable allocation of non-volatile memory , 2016, OOPSLA.

[27]  Bianca Schroeder,et al.  Evaluating File System Reliability on Solid State Drives , 2019, USENIX Annual Technical Conference.

[28]  Gregg Rothermel,et al.  Efficient construction of program dependence graphs , 1993, ISSTA '93.

[29]  Thomas R. Gross,et al.  FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD SAFETY VIOLATIONS PLDI 2012 , 2013 .

[30]  Taesoo Kim,et al.  Fuzzing File Systems via Two-Dimensional Input Space Exploration , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[31]  Ismail Oukid,et al.  FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory , 2016, SIGMOD Conference.

[32]  Nicolas Christin,et al.  Push-Button Verification of File Systems via Crash Refinement , 2016, USENIX Annual Technical Conference.

[33]  Junfeng Yang,et al.  EXPLODE: a lightweight, general system for finding serious storage system errors , 2006, OSDI '06.

[34]  Sebastian Burckhardt,et al.  Line-up: a complete and automatic linearizability checker , 2010, PLDI '10.

[35]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[36]  Haibo Chen,et al.  Performance and protection in the ZoFS user-space NVM file system , 2019, SOSP.

[37]  Changwoo Min,et al.  Cross-checking semantic correctness: the case of finding file system bugs , 2015, SOSP.

[38]  Andrea C. Arpaci-Dusseau,et al.  Error propagation analysis for file systems , 2009, PLDI '09.

[39]  Adam Chlipala,et al.  Using Crash Hoare logic for certifying the FSCQ file system , 2015, USENIX Annual Technical Conference.

[40]  Terence Kelly,et al.  Failure-Atomic Persistent Memory Updates via JUSTDO Logging , 2016, ASPLOS.

[41]  Jin Xiong,et al.  HiKV: A Hybrid Index Key-Value Store for DRAM-NVM Memory Systems , 2017, USENIX Annual Technical Conference.

[42]  Karl J. Ottenstein,et al.  The program dependence graph in a software development environment , 1984, SDE 1.

[43]  Jie Wu,et al.  Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory , 2018, OSDI.

[44]  Mayur Naik,et al.  APISan: Sanitizing API Usages through Semantic Cross-Checking , 2016, USENIX Security Symposium.

[45]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[46]  Junfeng Yang,et al.  Using model checking to find serious file system errors , 2004, TOCS.

[47]  Sanjay Kumar,et al.  Yat: A Validation Framework for Persistent Memory Software , 2014, USENIX Annual Technical Conference.

[48]  Margo I. Seltzer,et al.  Persistent Memcached: Bringing Legacy Code to Byte-Addressable Persistent Memory , 2017, HotStorage.

[49]  Jian Xu,et al.  NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories , 2016, FAST.

[50]  Taesoo Kim,et al.  Finding semantic bugs in file systems with an extensible fuzzing framework , 2019, SOSP.