Design and evaluation of a user-level file system for fast storage devices

Lately, fast storage devices are rapidly increasing in social network services, cloud platforms, etc. Unfortunately, the traditional Linux I/O stack is designed to maximize performance on disk-based storage. Emerging byte-addressable and low-latency non-volatile memory technologies (e.g., phase-change memories, MRAMs, and the memristor) provide very different characteristics, so the disk-based I/O stack cannot lead to high performance. This paper presents a high performance I/O stack for the fast storage devices. Our scheme is to remove the concept of block and to simplify the whole I/O path and software stack, which results in only two layers that are the byte-capable interface and the byte-aware file system called BAFS. We aim to minimize I/O latency and maximize bandwidth by eliminating the unnecessary layers and supporting byte-addressable I/O without requiring changes to applications. We have implemented a prototype and evaluated its performance with multiple benchmarks. The experimental results show that our I/O stack achieves 6.2 times on average and up to 17.5 times performance gains compared to the existing Linux I/O stack.

[1]  Hyeonsang Eom,et al.  Optimizing the Block I/O Subsystem for Fast Storage Devices , 2014, ACM Trans. Comput. Syst..

[2]  Josef Bacik,et al.  BTRFS: The Linux B-Tree Filesystem , 2013, TOS.

[3]  Frank Hady,et al.  When poll is better than interrupt , 2012, FAST.

[4]  Hyeonsang Eom,et al.  A User-Level File System for Fast Storage Devices , 2014, ICCAC.

[5]  Hyeonsang Eom,et al.  Optimizing the file system with variable-length I/O for fast storage devices , 2013, APSys.

[6]  A. L. Narasimha Reddy,et al.  SCMFS: A file system for Storage Class Memory , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[7]  Hyojun Kim,et al.  Evaluating Phase Change Memory for Enterprise Storage Systems: A Study of Caching and Tiering Approaches , 2014, TOS.

[8]  Yale N. Patt,et al.  Scheduling algorithms for modern disk drives , 1994, SIGMETRICS 1994.

[9]  Heon Young Yeom,et al.  Dynamic Interval Polling and Pipelined Post I/O Processing for Low-Latency Storage Class Memory , 2013, HotStorage.

[10]  Steven Swanson,et al.  Providing safe, user space access to fast, solid state disks , 2012, ASPLOS XVII.

[11]  OHAD RODEH,et al.  B-trees, shadowing, and clones , 2008, TOS.

[12]  Hitoshi Oi,et al.  A Case Study: Performance Evaluation of a DRAM-Based Solid State Disk , 2007, 2007 Japan-China Joint Workshop on Frontier of Computer Science and Technology (FCST 2007).

[13]  Shih-Hung Chen,et al.  Phase-change random access memory: A scalable technology , 2008, IBM J. Res. Dev..

[14]  Rajesh K. Gupta,et al.  Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[15]  Hyeonsang Eom,et al.  Exploiting Peak Device Throughput from Random Access Workload , 2012, HotStorage.

[16]  Andrea C. Arpaci-Dusseau,et al.  A Study of Linux File System Evolution , 2013, FAST.

[17]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[18]  David J. Lilja,et al.  High performance solid state storage under Linux , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[19]  Ricardo Sousa,et al.  Non-volatile magnetic random access memories (MRAM) , 2005 .

[20]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.

[21]  David Woodhouse,et al.  JFFS : The Journalling Flash File System , 2001 .

[22]  R. Card,et al.  Design and Implementation of the Second Extended Filesystem , 2001 .