Ffsck: the fast file system checker

Failures, errors, and bugs can corrupt file systems and cause data loss, despite the presence of journals and similar preventive techniques. While consistency checkers such as fsck can detect corruption and repair a damaged image, they are generally created as an afterthought, to be run only at rare intervals. Thus, checkers operate slowly, causing significant downtime for large scale storage systems. We address this dilemma by treating the checker as a key component of the overall file system, rather than a peripheral add-on. To this end, we present a modified ext3 file system, rext3, to directly support the fast file-system checker, ffsck. Rext3 colocates and self-identifies its metadata blocks, removing the need for costly seeks and tree traversals during checking. These modifications allow ffsck to scan and repair the file system at rates approaching the full sequential bandwidth of the underlying device. In addition, we demonstrate that rext3 generally performs competitively with ext3 and exceeds it in handling random reads and large writes. Finally, we apply our principles to FreeBSD’s FFS file system and its checker, doing so in a lightweight fashion that preserves the file-system layout while still providing some of the performance gains from ffsck.

[1]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[2]  José M. García,et al.  The Design of New Journaling File Systems: The DualFS Case , 2007, IEEE Transactions on Computers.

[3]  Sanjay Agrawal,et al.  Fast Consistency Checking for the Solaris File System , 1998, USENIX Annual Technical Conference.

[4]  Robert Baumann,et al.  Soft errors in advanced computer systems , 2005, IEEE Design & Test of Computers.

[5]  Pau Klein,et al.  San Francisco, California , 2007 .

[6]  Takashi Sato,et al.  EXT4 Online Defragmentation , 2007 .

[7]  Angela Demke Brown,et al.  Recon: Verifying file system consistency at runtime , 2012, TOS.

[8]  Yale N. Patt,et al.  Metadata update performance in file systems , 1994, OSDI '94.

[9]  Marshall K. McKusick,et al.  Running "fsck" in the Background , 2002, BSDCon.

[10]  Robert B. Hagmann,et al.  Reimplementing the Cedar file system using logging and group commit , 1987, SOSP '87.

[11]  Andrea C. Arpaci-Dusseau,et al.  An analysis of data corruption in the storage stack , 2008, TOS.

[12]  George Neville-Neil,et al.  The Design and Implementation of the FreeBSD Operating System , 2014 .

[13]  Morgan Stuart,et al.  Fast '13: 11th Usenix Conference on File and Storage Technologies Opening Remarks File Systems Ffsck: the Fast File System Checker , .

[14]  Marshall K. McKusick Improving the Performance of fsck in FreeBSD , 2013, login Usenix Mag..

[15]  Kimberly Keeton,et al.  Automating data dependability , 2002, EW 10.

[16]  Stephen C. Tweedie,et al.  Journaling the Linux ext2fs Filesystem , 2008 .

[17]  Margo I. Seltzer,et al.  A Comparison of FFS Disk Allocation Policies , 1996, USENIX Annual Technical Conference.

[18]  Margo I. Seltzer,et al.  Unifying File System Protection , 2001, USENIX Annual Technical Conference, General Track.

[19]  Andrea C. Arpaci-Dusseau,et al.  End-to-end Data Integrity for File Systems: A ZFS Case Study , 2010, FAST.

[20]  T. May,et al.  Alpha-particle-induced soft errors in dynamic memories , 1979, IEEE Transactions on Electron Devices.

[21]  T. J. Kowalski,et al.  Fsck—the UNIX file system check program , 1990 .

[22]  Junfeng Yang,et al.  EXPLODE: a lightweight, general system for finding serious storage system errors , 2006, OSDI '06.

[23]  Zach Brown,et al.  Chunkfs: Using Divide-and-Conquer to Improve File System Reliability and Repair , 2006, HotDep.

[24]  James Lau,et al.  File System Design for an NFS File Server Appliance , 1994, USENIX Winter.

[25]  Dawson R. Engler,et al.  Static Analysis versus Software Model Checking for Bug Finding , 2004, VMCAI.

[26]  Josef Bacik,et al.  BTRFS: The Linux B-Tree Filesystem , 2013, TOS.

[27]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[28]  Spencer W. Ng,et al.  Disk scrubbing in large archival storage systems , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..

[29]  Brian N. Bershad,et al.  Improving the reliability of commodity operating systems , 2005, TOCS.

[30]  Andrea C. Arpaci-Dusseau,et al.  Consistency without ordering , 2012, FAST.

[31]  Susan Tiefenbrun,et al.  SAN JOSE (California) , 2012 .

[32]  R. S. Fabry,et al.  A fast file system for UNIX , 1984, TOCS.

[33]  Erik Riedel,et al.  More Than an Interface - SCSI vs. ATA , 2003, FAST.

[34]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.

[35]  Dawson R. Engler,et al.  Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[36]  Junfeng Yang,et al.  Using model checking to find serious file system errors , 2004, TOCS.

[37]  Lisa Spainhower,et al.  Commercial fault tolerance: a tale of two systems , 2004, IEEE Transactions on Dependable and Secure Computing.

[38]  George Candea,et al.  Scalable testing of file system checkers , 2012, EuroSys '12.

[39]  J. Ziegler,et al.  Effect of Cosmic Rays on Computer Memories , 1979, Science.

[40]  Shankar Pasupathy,et al.  An analysis of latent sector errors in disk drives , 2007, SIGMETRICS '07.

[41]  Eduardo Pinheiro,et al.  DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.

[42]  Andrea C. Arpaci-Dusseau,et al.  SQCK: A Declarative File System Checker , 2008, OSDI.

[43]  Daniel Pierre Bovet,et al.  Understanding the Linux Kernel , 2000 .