sJournal: A New Design of Journaling for File Systems to Provide Crash Consistency

Maintain consistency is one of the major challenges faced by modern file systems in the presence of system crashes. File systems have evolved various techniques to provide crash consistency, in which journaling technique is one of the most important. Unfortunately, journaling introduces a write-twice problem: the write traffic is firstly written to the journal space, and latter is written back to the file system space. This problem is critical when version consistency is required in data management applications. To address this problem, we present sJournal, a smart journaling layer which can provide version consistency to the upper file systems efficiently. The key idea of sJournal is to understand the block I/O traffic issued from upper file systems, and redirect the I/O traffic between the journal space and the file system space intelligently. This includes four techniques: 1) detect the upper file system and extract the disk block allocation status, 2) identify and log all the overwrite traffic to the journal space while issuing non-overwrite traffic to the file system space directly, 3) redirect read traffic to the journal space if the target block is logged, 4) checkpoint all the logged data to the file system space at proper timing. We implemented a prototype of sJournal, and incorporated it with Ext3. Through experiments, we compared the performance of Ext3 running with ordered mode, data journal mode and sJournal, respectively. The results show that Ext3 with sJournal support can provide comparable performance to ordered journal mode, while ensuring the version consistency guaranteed in data journal mode.

[1]  T. J. Kowalski,et al.  Fsck—the UNIX file system check program , 1990 .

[2]  Yale N. Patt,et al.  Metadata update performance in file systems , 1994, OSDI '94.

[3]  Robert B. Hagmann,et al.  Reimplementing the Cedar file system using logging and group commit , 1987, SOSP '87.

[4]  José M. García,et al.  The Design of New Journaling File Systems: The DualFS Case , 2007, IEEE Transactions on Computers.

[5]  Xin Li,et al.  A Memory Soft Error Measurement on Production Systems , 2007, USENIX Annual Technical Conference.

[6]  Dongil Park,et al.  Resolving journaling of journal anomaly in android I/O: multi-version B-tree with lazy split , 2014, FAST.

[7]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.

[8]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[9]  Michael Kaminsky,et al.  Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles , 2013, SOSP 2013.

[10]  James L. Walsh,et al.  Field testing for cosmic ray soft errors in semiconductor memories , 1996, IBM J. Res. Dev..

[11]  Meng Zhu,et al.  Journaling of journal is (almost) free , 2014, FAST.

[12]  Andrea C. Arpaci-Dusseau,et al.  Ffsck: the fast file system checker , 2013, FAST.

[13]  OHAD RODEH,et al.  B-trees, shadowing, and clones , 2008, TOS.

[14]  Youjip Won,et al.  I/O Stack Optimization for Smartphones , 2013, USENIX ATC.

[15]  James Lau,et al.  File System Design for an NFS File Server Appliance , 1994, USENIX Winter.

[16]  Dawson R. Engler,et al.  Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[17]  Josef Bacik,et al.  BTRFS: The Linux B-Tree Filesystem , 2013, TOS.

[18]  Junfeng Yang,et al.  An empirical study of operating systems errors , 2001, SOSP.

[19]  Andrea C. Arpaci-Dusseau,et al.  Consistency without ordering , 2012, FAST.

[20]  Stephen C. Tweedie,et al.  Journaling the Linux ext2fs Filesystem , 2008 .

[21]  Rajeev Nagar,et al.  Windows NT file system internals - a developer's guide: building NT file system drivers , 1997 .

[22]  Alan Messer,et al.  Increasing relevance of memory hardware errors: a case for recoverable programming models , 2000, EW 9.

[23]  Kanad Ghose,et al.  hFS: a hybrid file system prototype for improving small file and metadata performance , 2007, EuroSys '07.

[24]  Val Henson,et al.  The Zettabyte File System , 2003 .

[25]  Andrea C. Arpaci-Dusseau,et al.  Analysis and Evolution of Journaling File Systems , 2005, USENIX Annual Technical Conference, General Track.

[26]  L. Vivier,et al.  The new ext 4 filesystem : current status and future plans , 2007 .

[27]  Eduardo Pinheiro,et al.  DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.

[28]  Christophe Calvès,et al.  Faults in linux: ten years later , 2011, ASPLOS XVI.

[29]  Andrea C. Arpaci-Dusseau,et al.  Optimistic crash consistency , 2013, SOSP.

[30]  José M. García,et al.  DualFS: a new journaling file system without meta-data duplication , 2002, ICS '02.

[31]  Andrea C. Arpaci-Dusseau,et al.  A Study of Linux File System Evolution , 2013, FAST.

[32]  Andrea C. Arpaci-Dusseau,et al.  SQCK: A Declarative File System Checker , 2008, OSDI.