Specifying and Checking File System Crash-Consistency Models

Applications depend on persistent storage to recover state after system crashes. But the POSIX file system interfaces do not define the possible outcomes of a crash. As a result, it is difficult for application writers to correctly understand the ordering of and dependencies between file system operations, which can lead to corrupt application state and, in the worst case, catastrophic data loss. This paper presents crash-consistency models, analogous to memory consistency models, which describe the behavior of a file system across crashes. Crash-consistency models include both litmus tests, which demonstrate allowed and forbidden behaviors, and axiomatic and operational specifications. We present a formal framework for developing crash-consistency models, and a toolkit, called Ferrite, for validating those models against real file system implementations. We develop a crash-consistency model for ext4, and use Ferrite to demonstrate unintuitive crash behaviors of the ext4 implementation. To demonstrate the utility of crash-consistency models to application writers, we use our models to prototype proof-of-concept verification and synthesis tools, as well as new library interfaces for crash-safe applications.

[1]  Remzi H. Arpaci-Dusseau Operating Systems: Three Easy Pieces , 2015, login Usenix Mag..

[2]  Adam Chlipala,et al.  Using Crash Hoare logic for certifying the FSCQ file system , 2015, USENIX Annual Technical Conference.

[3]  Tom Ridge,et al.  SibylFS: formal specification and oracle-based testing for POSIX and real-world file systems , 2015, SOSP.

[4]  Sang-Won Lee,et al.  Lightweight Application-Level Crash Consistency on Transactional Flash Storage , 2015, USENIX Annual Technical Conference.

[5]  Joo Young Hwang,et al.  F2FS: A New File System for Flash Storage , 2015, FAST.

[6]  Terence Kelly,et al.  Failure-Atomic Updates of Application Data in a Linux File System , 2015, FAST.

[7]  Mark Lillibridge,et al.  Torturing Databases for Fun and Profit , 2014, OSDI.

[8]  Andrea C. Arpaci-Dusseau,et al.  All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications , 2014, OSDI.

[9]  Emina Torlak,et al.  A lightweight symbolic virtual machine for solver-aided host languages , 2014, PLDI.

[10]  Gidon Ernst,et al.  Development of a Verified Flash File System , 2014, ABZ.

[11]  Meng Zhu,et al.  Journaling of journal is (almost) free , 2014, FAST.

[12]  Timothy Roscoe,et al.  Arrakis , 2014, OSDI.

[13]  Sidney Amani,et al.  File systems deserve verification too! , 2013, PLOS '13.

[14]  Andrea C. Arpaci-Dusseau,et al.  Towards efficient, portable application-level consistency , 2013, HotDep.

[15]  Andrea C. Arpaci-Dusseau,et al.  Optimistic crash consistency , 2013, SOSP.

[16]  Austin T. Clements,et al.  The scalable commutativity rule: designing scalable software for multicore processors , 2013, SOSP.

[17]  Josef Bacik,et al.  BTRFS: The Linux B-Tree Filesystem , 2013, TOS.

[18]  Terence Kelly,et al.  Failure-atomic msync(): a simple and efficient mechanism for preserving the integrity of durable data , 2013, EuroSys '13.

[19]  Andrea C. Arpaci-Dusseau,et al.  A Study of Linux File System Evolution , 2013, FAST.

[20]  J. Alglave A formal hierarchy of weak memory models , 2012, Formal Methods Syst. Des..

[21]  Andrea C. Arpaci-Dusseau,et al.  Consistency without ordering , 2012, FAST.

[22]  David A. Wood,et al.  A Primer on Memory Consistency and Cache Coherence , 2012, Synthesis Lectures on Computer Architecture.

[23]  Jade Alglave,et al.  Litmus: Running Tests against Hardware , 2011, TACAS.

[24]  Eran Yahav,et al.  Automatic inference of memory fences , 2010, Formal Methods in Computer Aided Design.

[25]  Tayssir Touili,et al.  Proceedings of the 22nd international conference on Computer Aided Verification , 2010 .

[26]  Jade Alglave,et al.  Fences in Weak Memory Models , 2010, CAV.

[27]  Rajeev Alur,et al.  Generating Litmus Tests for Contrasting Memory Consistency Models , 2010, CAV.

[28]  Emina Torlak,et al.  MemSAT: checking axiomatic specifications of memory models , 2010, PLDI '10.

[29]  K. Rustan M. Leino,et al.  Dafny: An Automatic Program Verifier for Functional Correctness , 2010, LPAR.

[30]  David Flynn,et al.  DFS: A file system for virtualized flash storage , 2010, TOS.

[31]  Donald E. Porter,et al.  Operating System Transactions , 2009, SOSP '09.

[32]  Sarita V. Adve,et al.  Memory models: a case for rethinking parallel languages and hardware , 2009, PODC '09.

[33]  Andrea C. Arpaci-Dusseau,et al.  Error propagation analysis for file systems , 2009, PLDI '09.

[34]  Erez Zadok,et al.  Enabling Transactional File Access via Lightweight Kernel Extensions , 2009, FAST.

[35]  Gérard Boudol,et al.  Relaxed memory models: an operational approach , 2009, POPL '09.

[36]  Lidong Zhou,et al.  Transactional Flash , 2008, OSDI.

[37]  Daniel Jackson,et al.  Formal Modeling and Analysis of a Flash Filesystem in Alloy , 2008, ABZ.

[38]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[39]  Armando Solar-Lezama,et al.  Program synthesis by sketching , 2008 .

[40]  Lei Zhang,et al.  Generalized file system dependencies , 2007, SOSP.

[41]  Erez Zadok,et al.  Extending ACID semantics to the file system , 2007, TOS.

[42]  Emina Torlak,et al.  Kodkod: A Relational Model Finder , 2007, TACAS.

[43]  Gerard J. Holzmann,et al.  A mini challenge: build a verifiable filesystem , 2007, Formal Aspects of Computing.

[44]  Jason Flinn,et al.  Rethink the sync , 2006, OSDI '06.

[45]  Junfeng Yang,et al.  EXPLODE: a lightweight, general system for finding serious storage system errors , 2006, OSDI '06.

[46]  Sanjit A. Seshia,et al.  Combinatorial sketching for finite programs , 2006, ASPLOS XII.

[47]  Junfeng Yang,et al.  Using model checking to find serious file system errors , 2004, TOCS.

[48]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[49]  Andrea C. Arpaci-Dusseau,et al.  Model-based failure analysis of journaling file systems , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[50]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX Annual Technical Conference, FREENIX Track.

[51]  Yue Yang,et al.  UMM: an operational memory model specification framework with integrated model checking capability , 2005, Concurr. Pract. Exp..

[52]  Jeremy Manson,et al.  The Java memory model , 2005, POPL '05.

[53]  Yue Yang,et al.  Nemos: a framework for axiomatic and executable specifications of memory consistency models , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[54]  TU MarkusWenzel Some aspects of Unix file-system security , 2001 .

[55]  Dominic Giampaolo,et al.  Practical File System Design with the Be File System , 1998 .

[56]  Robert Grimm,et al.  Application performance and flexibility on exokernel systems , 1997, SOSP.

[57]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.

[58]  Dawson R. Engler,et al.  Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.

[59]  Yale N. Patt,et al.  Metadata update performance in file systems , 1994, OSDI '94.

[60]  James Lau,et al.  File System Design for an NFS File Server Appliance , 1994, USENIX Winter.

[61]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[62]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[63]  T. J. Kowalski,et al.  Fsck—the UNIX file system check program , 1990 .

[64]  Robert B. Hagmann,et al.  Reimplementing the Cedar file system using logging and group commit , 1987, SOSP '87.

[65]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[66]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[67]  Raymond A. Lorie,et al.  Physical integrity in a large segmented database , 1977, TODS.

[68]  A. Retrospective,et al.  The UNIX Time-sharing System , 1977 .