NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories

Fast non-volatile memories (NVMs) will soon appear on the processor memory bus alongside DRAM. The resulting hybrid memory systems will provide software with sub-microsecond, high-bandwidth access to persistent data, but managing, accessing, and maintaining consistency for data stored in NVM raises a host of challenges. Existing file systems built for spinning or solid-state disks introduce software overheads that would obscure the performance that NVMs should provide, but proposed file systems for NVMs either incur similar overheads or fail to provide the strong consistency guarantees that applications require. We present NOVA, a file system designed to maximize performance on hybrid memory systems while providing strong consistency guarantees. NOVA adapts conventional log-structured file system techniques to exploit the fast random access that NVMs provide. In particular, it maintains separate logs for each inode to improve concurrency, and stores file data outside the log to minimize log size and reduce garbage collection costs. NOVA's logs provide metadata, data, and mmap atomicity and focus on simplicity and reliability, keeping complex metadata structures in DRAM to accelerate lookup operations. Experimental results show that in write-intensive workloads, NOVA provides 22% to 216× throughput improvement compared to state-of-the-art file systems, and 3.1× to 13.5× improvement compared to file systems that provide equally strong data consistency guarantees.

[1]  D. Stewart,et al.  The missing memristor found , 2008, Nature.

[2]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[3]  Andrea C. Arpaci-Dusseau,et al.  All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications , 2014, OSDI.

[4]  Takayuki Kawahara,et al.  Scalable Spin-Transfer Torque RAM Technology for Normally-Off Computing , 2011, IEEE Design & Test of Computers.

[5]  Yuan Xie,et al.  Kiln: Closing the performance gap between systems with and without persistence support , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Roy H. Campbell,et al.  Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory , 2011, FAST.

[7]  Peter M. Chen,et al.  Free transactions with Rio Vista , 1997, SOSP.

[8]  Robert B. Hagmann,et al.  Reimplementing the Cedar file system using logging and group commit , 1987, SOSP '87.

[9]  Sanjay Kumar,et al.  System software for persistent memory , 2014, EuroSys '14.

[10]  James Lau,et al.  File System Design for an NFS File Server Appliance , 1994, USENIX Winter.

[11]  Carl Staelin,et al.  An Implementation of a Log-Structured File System for UNIX , 1993, USENIX Winter.

[12]  Josef Bacik,et al.  BTRFS: The Linux B-Tree Filesystem , 2013, TOS.

[13]  Subramanya Dulloor,et al.  Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems , 2015, SIGMOD Conference.

[14]  Bingsheng He,et al.  NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems , 2015, FAST.

[15]  Yanping Zhao,et al.  HyLog: A High Performance Approach to Managing Disk Layout , 2004, FAST.

[16]  Steven Swanson,et al.  QuickSAN: a storage area network for fast, distributed, solid state disks , 2013, ISCA.

[17]  Kaladhar Voruganti,et al.  An empirical study of file systems on NVM , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[18]  Philippe Bonnet,et al.  uFLIP: Understanding Flash IO Patterns , 2009, CIDR.

[19]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[20]  Dhruva R. Chakrabarti,et al.  Implications of CPU Caching on Byte-addressable Non-Volatile Memory Programming , 2012 .

[21]  Mahadev Satyanarayanan,et al.  Lightweight recoverable virtual memory , 1993, SOSP '93.

[22]  Frank Hady,et al.  When poll is better than interrupt , 2012, FAST.

[23]  Xiaodong Zhang,et al.  Understanding intrinsic characteristics and system implications of flash memory based solid state drives , 2009, SIGMETRICS '09.

[24]  Sang-Won Lee,et al.  SFS: random write considered harmful in solid state drives , 2012, FAST.

[25]  Rajesh K. Gupta,et al.  Onyx: A Prototype Phase Change Memory Storage Array , 2011, HotStorage.

[26]  Wei Wang,et al.  ReconFS: a reconstructable file system on flash storage , 2014, FAST.

[27]  Joo Young Hwang,et al.  F2FS: A New File System for Flash Storage , 2015, FAST.

[28]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[29]  André Brinkmann,et al.  Direct lookup and hash-based metadata placement for local file systems , 2013, SYSTOR '13.

[30]  Qin Jin,et al.  Persistent B+-Trees in Non-Volatile Main Memory , 2015, Proc. VLDB Endow..

[31]  Shih-Hung Chen,et al.  Phase-change random access memory: A scalable technology , 2008, IBM J. Res. Dev..

[32]  Michael Wu,et al.  eNVy: a non-volatile, main memory storage system , 1994, ASPLOS VI.

[33]  Ashish Gupta,et al.  The RAMCloud Storage System , 2015, ACM Trans. Comput. Syst..

[34]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[35]  Orion Hodson,et al.  Whole-system Persistence with Non-volatile Memories , 2012 .

[36]  Tao Zhang,et al.  How to get more value from your file system directory cache , 2015, SOSP.

[37]  11th USENIX Symposium on Operating Systems Design and Implementation, OSDI '14, Broomfield, CO, USA, October 6-8, 2014 , 2014, OSDI.

[38]  Terence Kelly,et al.  Failure-atomic msync(): a simple and efficient mechanism for preserving the integrity of durable data , 2013, EuroSys '13.

[39]  M. Breitwisch Phase Change Memory , 2008, 2008 International Interconnect Technology Conference.

[40]  Thomas F. Wenisch,et al.  Memory persistency , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[41]  Orion Hodson,et al.  Whole-system persistence , 2012, ASPLOS XVII.

[42]  Eunji Lee,et al.  Unioning of the buffer cache and journaling layers with non-volatile memory , 2013, FAST.

[43]  Kathryn S. McKinley,et al.  Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[44]  Michael M. Swift,et al.  Aerie: flexible file-system interfaces to storage-class memory , 2014, EuroSys '14.

[45]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[46]  Rajesh K. Gupta,et al.  Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[47]  Duane Mills,et al.  19.7 A 16Gb ReRAM with 200MB/s write and 1GB/s read in 27nm technology , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[48]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[49]  Sara McMains,et al.  File System Logging versus Clustering: A Performance Comparison , 1995, USENIX.

[50]  Jun Wang,et al.  WOLF - A Novel Reordering Write Buffer to Boost the Performance of Log-Structured File Systems , 2002, FAST.

[51]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[52]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[53]  Steven Swanson,et al.  A study of application performance with non-volatile main memory , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[54]  Remzi H. Arpaci-Dusseau Operating Systems: Three Easy Pieces , 2015, login Usenix Mag..

[55]  Steven Swanson,et al.  DC express: shortest latency protocol for reading phase change memory over PCI express , 2014, FAST.

[56]  A. L. Narasimha Reddy,et al.  SCMFS: A file system for Storage Class Memory , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[57]  Stephen M. Rumble,et al.  Log-structured memory for DRAM-based storage , 2014, FAST.

[58]  Parthasarathy Ranganathan,et al.  Consistent, durable, and safe memory management for byte-addressable non volatile main memory , 2013, TRIOS@SOSP.

[59]  Vasily Tarasov,et al.  A fast and slippery slope for file systems , 2015, INFLOW '15.

[60]  Andrea C. Arpaci-Dusseau,et al.  Optimistic crash consistency , 2013, SOSP.

[61]  Steven Swanson,et al.  Providing safe, user space access to fast, solid state disks , 2012, ASPLOS XVII.

[62]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[63]  Jian Xu,et al.  Bankshot: caching slow storage in fast non-volatile memory , 2013, INFLOW '13.

[64]  Qi Wang,et al.  A 20nm 1.8V 8Gb PRAM with 40MB/s program bandwidth , 2012, 2012 IEEE International Solid-State Circuits Conference.

[65]  Satoshi Takaya,et al.  7.5 A 3.3ns-access-time 71.2μW/MHz 1Mb embedded STT-MRAM using physically eliminated read-disturb scheme and normally-off memory architecture , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[66]  Andrea C. Arpaci-Dusseau,et al.  Consistency without ordering , 2012, FAST.