The Conquest file system: Better performance through a disk/persistent-RAM hybrid design

Modern file systems assume the use of disk, a system-wide performance bottleneck for over a decade. Current disk caching and RAM file systems either impose high overhead to access memory content or fail to provide mechanisms to achieve data persistence across reboots.The Conquest file system is based on the observation that memory is becoming inexpensive, which enables all file system services to be delivered from memory, except for providing large storage capacity. Unlike caching, Conquest uses memory with battery backup as persistent storage, and provides specialized and separate data paths to memory and disk. Therefore, the memory data path contains no disk-related complexity. The disk data path consists of optimizations only for the specialized disk usage pattern.Compared to a memory-based file system, Conquest incurs little performance overhead. Compared to several disk-based file systems, Conquest achieves 1.3x to 19x faster memory performance, and 1.4x to 2.0x faster performance when exercising both memory and disk.Conquest realizes most of the benefits of persistent RAM at a fraction of the cost of a RAM-only solution. It also demonstrates that disk-related optimizations impose high overheads for accessing memory content in a memory-rich environment.

[1]  Sivan Toledo,et al.  A Transactional Flash File System for Microcontrollers , 2005, USENIX Annual Technical Conference, General Track.

[2]  Peter M. Chen,et al.  The Rio file cache: surviving operating system crashes , 1996, ASPLOS VII.

[3]  Hideto Niijima Design of a solid-state file using flash EEPROM , 1995, IBM J. Res. Dev..

[4]  Geoffrey H. Kuenning,et al.  The Effects of Memory-Rich Environments on File System Microbenchmarks , 2003 .

[5]  David Woodhouse,et al.  JFFS : The Journalling Flash File System , 2001 .

[6]  R. Card,et al.  Design and Implementation of the Second Extended Filesystem , 2001 .

[7]  Sanjay Agrawal,et al.  Fast Consistency Checking for the Solaris File System , 1998, USENIX Annual Technical Conference.

[8]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[9]  Scott A. Brandt,et al.  HeRMES: high-performance reliable MRAM-enabled storage , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[10]  William J. Bolosky,et al.  A large-scale study of file-system contents , 1999, SIGMETRICS '99.

[11]  Marshall K. McKusick,et al.  Running "fsck" in the Background , 2002, BSDCon.

[12]  John A. Kunze,et al.  A trace-driven analysis of the UNIX 4.2 BSD file system , 1985, SOSP '85.

[13]  Mary Baker,et al.  Non-volatile memory for fast, reliable file systems , 1992, ASPLOS V.

[14]  Steve R. Kleiman,et al.  Vnodes: An Architecture for Multiple File System Types in Sun UNIX , 1986, USENIX Summer.

[15]  W. Vogels File system usage in Windows NT 4.0 , 2000, OPSR.

[16]  Anirban Mahanti,et al.  Traffic analysis of a Web proxy caching hierarchy , 2000 .

[17]  Kai Li,et al.  Storage alternatives for mobile computers , 1994, OSDI '94.

[18]  Gustaaf Borghs,et al.  Technology assessment for the implementation of magnetoresistive elements with semiconductor components in magnetic random access memory (MRAM) architectures , 1999 .

[19]  Ronald Fagin,et al.  Extendible hashing—a fast access method for dynamic files , 1979, ACM Trans. Database Syst..

[20]  Amin Vahdat,et al.  Interposed request routing for scalable network storage , 2000, TOCS.

[21]  J. Howard Et El,et al.  Scale and performance in a distributed file system , 1988 .

[22]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[23]  Hector Garcia-Molina,et al.  Main Memory Database Systems: An Overview , 1992, IEEE Trans. Knowl. Data Eng..

[24]  Margo I. Seltzer,et al.  Journaling Versus Soft Updates: Asynchronous Meta-data Protection in File Systems , 2000, USENIX Annual Technical Conference, General Track.

[25]  Edward Grochowski,et al.  Technological impact of magnetic hard disk drives on storage systems , 2003, IBM Syst. J..

[26]  William J. Bolosky,et al.  Distributed schedule management in the Tiger video fileserver , 1997, SOSP.

[27]  James Lau,et al.  File System Design for an NFS File Server Appliance , 1994, USENIX Winter.

[28]  Manu Thapar,et al.  A Novel Video Layout Strategy for Near-Video-on-Demand Servers , 1997, ICMCS.

[29]  R. S. Fabry,et al.  A fast file system for UNIX , 1984, TOCS.

[30]  Kai Li,et al.  Operating system implications of solid-state mobile computers , 1993, Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III.

[31]  Jeffrey F. Naughton,et al.  Multiprocessor Main Memory Transaction Processing , 1988, Proceedings [1988] International Symposium on Databases in Parallel and Distributed Systems.

[32]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.

[33]  Keith Bostic,et al.  A Pageable Memory Based Filesystem , 1990, USENIX Summer.

[34]  David A. Patterson,et al.  Designing Disk Arrays for High Data Reliability , 1993, J. Parallel Distributed Comput..

[35]  Erik Riedel,et al.  A performance study of sequential I/O on windows NT TM 4 , 1998 .

[36]  Yale N. Patt,et al.  Soft updates: a solution to the metadata update problem in file systems , 2000 .

[37]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[38]  Peter M. Chen,et al.  Comparing disk and memory's resistance to operating system crashes , 1996, Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering.

[39]  Mary Baker,et al.  Measurements of a distributed file system , 1991, SOSP '91.

[40]  John Kunze,et al.  A trace-driven analysis of the unix 4 , 1985, SOSP 1985.

[41]  James L. Peterson,et al.  Buddy systems , 1977, CACM.

[42]  K. Thompson,et al.  UNIX time-sharing system: UNIX implementation , 1978, The Bell System Technical Journal.

[43]  Jeff Bonwick,et al.  The Slab Allocator: An Object-Caching Kernel Memory Allocator , 1994, USENIX Summer.

[44]  Hiroshi Motoda,et al.  A Flash-Memory Based File System , 1995, USENIX.

[45]  Michael Wu,et al.  eNVy: a non-volatile, main memory storage system , 1994, ASPLOS VI.

[46]  Gregory R. Ganger,et al.  Track-Aligned Extents: Matching Access Patterns to Disk Drive Characteristics , 2002, FAST.

[47]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[48]  Yale N. Patt,et al.  Metadata update performance in file systems , 1994, OSDI '94.

[49]  K Thompson,et al.  UNIX implementation , 1986 .

[50]  G. Kuenning,et al.  A Study of Irregularities in File-Size Distributions , 2002 .

[51]  Michael J. Carey,et al.  A recovery algorithm for a high-performance memory-resident database system , 1987, SIGMOD '87.

[52]  Gregory R. Ganger,et al.  Soft Updates: A Technique for Eliminating Most Synchronous Writes in the Fast Filesystem , 1999, USENIX Annual Technical Conference, FREENIX Track.

[53]  Scott A. Brandt,et al.  MRAMFS: a compressing file system for non-volatile RAM , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..

[54]  Hossein H. Ghannad,et al.  A trace-driven study of CMS file references , 1991, IBM J. Res. Dev..

[55]  Peter M. Chen,et al.  The Design and Verification of the Rio File Cache , 2001, IEEE Trans. Computers.