Model-based failure analysis of journaling file systems

We propose a novel method to measure the robustness of journaling file systems under disk write failures. In our approach, we build models of how journaling file systems order disk writes under different journaling modes and use these models to inject write failures during file system updates. Using our technique, we analyze if journaling file systems maintain on-disk consistency in the presence of disk write failures. We apply our technique to three important Linux journaling file systems: ext3, Reiserfs, and IBM JFS. From our analysis, we identify several design flaws and correctness bugs in these file systems, which can cause serious file system errors ranging from data corruption to unmountable file systems.

[1]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[2]  Ravishankar K. Iyer,et al.  FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior under Faults , 1993, IEEE Trans. Software Eng..

[3]  David A. Patterson,et al.  An Analysis of Error Behaviour in a Large Storage System , 1999 .

[4]  Andrea C. Arpaci-Dusseau,et al.  Fail-stutter fault tolerance , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[5]  Daniel P. Siewiorek,et al.  Fault Injection Experiments Using FIAT , 1990, IEEE Trans. Computers.

[6]  Ravishankar K. Iyer,et al.  Characterization of linux kernel behavior under errors , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[7]  Ravishankar K. Iyer,et al.  A hierarchical approach for dependability analysis of a commercial cache-based RAID storage architecture , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[8]  David A. Patterson,et al.  Towards Availability Benchmarks: A Case Study of Software RAID Systems , 2000, USENIX Annual Technical Conference, General Track.

[9]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[10]  Junfeng Yang,et al.  Using model checking to find serious file system errors , 2004, TOCS.

[11]  Peter F. Corbett,et al.  Awarded Best Paper! -- Row-Diagonal Parity for Double Disk Failure Correction , 2004 .

[12]  Peter F. Corbett,et al.  Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!) , 2004, USENIX Conference on File and Storage Technologies.

[13]  Philip Koopman,et al.  Performance evaluation of exception handling in I/O libraries , 2001, 2001 International Conference on Dependable Systems and Networks.

[14]  Andrea C. Arpaci-Dusseau,et al.  Analysis and Evolution of Journaling File Systems , 2005, USENIX Annual Technical Conference, General Track.

[15]  T. J. Kowalski,et al.  Fsck—the UNIX file system check program , 1990 .

[16]  Zbigniew Kalbarczyk,et al.  Fast Distributed Simulation for Dependability Analysis of a Cache-based RAID System , 1998 .

[17]  Philip Koopman What’s Wrong With Fault Injection As A Benchmarking Tool? , 2002 .

[18]  Ravishankar K. Iyer,et al.  Measuring Fault Tolerance with the FTAPE Fault Injection Tool , 1995, MMB.

[19]  Daniel P. Siewiorek,et al.  Development of a benchmark to measure system robustness , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[20]  Ravishankar K. Iyer,et al.  FTAPE: A fault injection tool to measure fault Tolerance , 1994 .

[21]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.