A low-latency storage stack for fast storage devices

Modern storage systems are facing an important challenge of making the best use of fast storage devices. Even though the underlying storage devices are being enhanced, the traditional storage stack falls short of utilizing the enhanced characteristics, as it has been optimized specifically for hard disk drives. In this article, we optimize the storage stack to maximize the benefit of low latency that fast storage devices provide. Our approach is to simplify the I/O path from application to the fast storage device by removing inefficient layers and the conventional block I/O. The proposed stack consists of three layers: an optimized device driver, a low-latency file system called L2FS, and a simplified VFS. The device driver provides a simple file I/O API to the file system instead of the existing block I/O API. L2FS, a variant of EXT4, performs low-latency I/O operations by using the file I/O API that our optimized device driver provides. We implement our storage stack on Linux 3.14.3 and evaluate it with multiple benchmarks. The results show that our system improves the throughput by up to 6.6 times and reduces the latency by an average of 54% compared to the existing storage stack on fast storage.

[1]  Ada Gavrilovska,et al.  pVM: persistent virtual memory for efficient capacity scaling and object storage , 2016, EuroSys.

[2]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[3]  Peter M. Chen,et al.  The Rio file cache: surviving operating system crashes , 1996, ASPLOS VII.

[4]  M CaulfieldAdrian,et al.  Providing safe, user space access to fast, solid state disks , 2012 .

[5]  Michael Wu,et al.  eNVy: a non-volatile, main memory storage system , 1994, ASPLOS VI.

[6]  Lingkun Wu,et al.  FSMAC: A file system metadata accelerator with non-volatile memory , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  Hitoshi Oi,et al.  A Case Study: Performance Evaluation of a DRAM-Based Solid State Disk , 2007, 2007 Japan-China Joint Workshop on Frontier of Computer Science and Technology (FCST 2007).

[8]  Andrea C. Arpaci-Dusseau,et al.  Optimistic crash consistency , 2013, SOSP.

[9]  Ippokratis Pandis,et al.  TPC-E vs. TPC-C: characterizing the new TPC-E benchmark via an I/O comparison study , 2011, SGMD.

[10]  Youyou Lu,et al.  A high performance file system for non-volatile main memory , 2016, EuroSys.

[11]  Joo Young Hwang,et al.  F2FS: A New File System for Flash Storage , 2015, FAST.

[12]  Xia Zhang,et al.  Adaptive security management of real-time storage applications over NAND based storage systems , 2015, J. Netw. Comput. Appl..

[13]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[14]  Satoshi Matsuoka,et al.  A User-Level InfiniBand-Based File System and Checkpoint Strategy for Burst Buffers , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[15]  Andrea C. Arpaci-Dusseau,et al.  IRON file systems , 2005, SOSP '05.

[16]  Steven Swanson,et al.  Providing safe, user space access to fast, solid state disks , 2012, ASPLOS XVII.

[17]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[18]  James Lau,et al.  File System Design for an NFS File Server Appliance , 1994, USENIX Winter.

[19]  Josef Bacik,et al.  BTRFS: The Linux B-Tree Filesystem , 2013, TOS.

[20]  S. Swanson,et al.  From ARIES to MARS : Reengineering Transaction Management for Next-Generation , Solid-State Drives , 2013 .

[21]  Frank Hady,et al.  When poll is better than interrupt , 2012, FAST.

[22]  Tianyu Wo,et al.  SpanFS: A Scalable File System on Fast Storage Devices , 2015, USENIX Annual Technical Conference.

[23]  Rajesh Gupta,et al.  From ARIES to MARS: transaction support for next-generation, solid-state drives , 2013, SOSP.

[24]  Atri Rudra,et al.  Bidirectional data verification for cloud storage , 2014, J. Netw. Comput. Appl..

[25]  Samuel J. Leffler,et al.  A Fast File System for UNIX (Revised July 27, 1983) , 1983 .

[26]  Ricardo Sousa,et al.  Non-volatile magnetic random access memories (MRAM) , 2005 .

[27]  Shih-Hung Chen,et al.  Phase-change random access memory: A scalable technology , 2008, IBM J. Res. Dev..

[28]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.

[29]  David J. Lilja,et al.  High performance solid state storage under Linux , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[30]  B. Dieny,et al.  Spin-dependent phenomena and their implementation in spintronic devices , 2008, 2008 International Symposium on VLSI Technology, Systems and Applications (VLSI-TSA).

[31]  Md. Saiful Azad,et al.  MySQL performance analysis on a limited resource server: Fedora vs. Ubuntu Linux , 2010, SpringSim.

[32]  Youyou Lu,et al.  ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices , 2016, USENIX Annual Technical Conference.

[33]  Sanjay Kumar,et al.  System software for persistent memory , 2014, EuroSys '14.

[34]  David Woodhouse,et al.  JFFS : The Journalling Flash File System , 2001 .

[35]  Qinghua Zheng,et al.  An optimized approach for storing and accessing small files on cloud storage , 2012, J. Netw. Comput. Appl..

[36]  Youngjae Kim,et al.  FlashSim: A Simulator for NAND Flash-Based Solid-State Drives , 2009, 2009 First International Conference on Advances in System Simulation.

[37]  Christoforos E. Kozyrakis,et al.  IX: A Protected Dataplane Operating System for High Throughput and Low Latency , 2014, OSDI.

[38]  Jian Xu,et al.  NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories , 2016, FAST.

[39]  Timothy Roscoe,et al.  Arrakis , 2014, OSDI.

[40]  Steven Swanson,et al.  DC express: shortest latency protocol for reading phase change memory over PCI express , 2014, FAST.

[41]  A. L. Narasimha Reddy,et al.  SCMFS: A file system for Storage Class Memory , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[42]  Hyojun Kim,et al.  Evaluating Phase Change Memory for Enterprise Storage Systems: A Study of Caching and Tiering Approaches , 2014, TOS.

[43]  Hyeonsang Eom,et al.  Optimizing the Block I/O Subsystem for Fast Storage Devices , 2014, ACM Trans. Comput. Syst..