Performance and protection in the ZoFS user-space NVM file system

Non-volatile memory (NVM) can be directly accessed in user space without going through the kernel. This encourages several recent studies on building user-space NVM file systems. However, for the sake of file system protection, none of the existing file systems grant user-space file system libraries with direct control over both metadata and data of the NVM, leaving fast NVM resources underexploited. Based on the observation that applications tend to group files with similar access permissions within the same directory and permission changes are rare operations, this paper proposes a new abstraction called coffer, which is a collection of isolated NVM resources, and show its merits on building a performant and protected NVM file system in user space. The key idea is to separate NVM protection from management via coffers so that user-space libraries can take full control of NVM within a coffer while the kernel guarantees strict isolation among coffers. Based on coffers, we build an NVM file system architecture to bring the high performance of NVM to unmodified dynamically linked applications and facilitate the development of performant and flexible user-space NVM file system libraries. With an example file system called ZoFS, we show that user-space file systems built upon coffers can outperform existing NVM file systems in both benchmarks and real-world applications.

[1]  Margo I. Seltzer,et al.  Closing the Performance Gap Between Volatile and Persistent Key-Value Stores Using Cross-Referencing Logs , 2018, USENIX ATC.

[2]  Takayuki Kawahara,et al.  Scalable Spin-Transfer Torque RAM Technology for Normally-Off Computing , 2011, IEEE Design & Test of Computers.

[3]  Yiming Huai,et al.  Spin-Transfer Torque MRAM (STT-MRAM): Challenges and Prospects , 2008 .

[4]  Jiwu Shu,et al.  Log-Structured Non-Volatile Main Memory , 2017, USENIX Annual Technical Conference.

[5]  Roy H. Campbell,et al.  Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory , 2011, FAST.

[6]  Sanjay Kumar,et al.  System software for persistent memory , 2014, EuroSys '14.

[7]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[8]  Haibo Chen,et al.  Espresso: Brewing Java For More Non-Volatility with Non-volatile Memory , 2017, ASPLOS.

[9]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[10]  Jian Xu,et al.  NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories , 2016, FAST.

[11]  Peter Druschel,et al.  ERIM: Secure, Efficient In-process Isolation with Protection Keys (MPK) , 2019, USENIX Security Symposium.

[12]  Thomas E. Anderson,et al.  Arrakis: A Case for the End of the Empire , 2013, HotOS.

[13]  Youjip Won,et al.  I/O Stack Optimization for Smartphones , 2013, USENIX ATC.

[14]  Zach Brown,et al.  Chunkfs: Using Divide-and-Conquer to Improve File System Reliability and Repair , 2006, HotDep.

[15]  Thomas E. Anderson,et al.  Strata: A Cross Media File System , 2017, SOSP.

[16]  Andrea C. Arpaci-Dusseau,et al.  Redesigning LSMs for Nonvolatile Memory with NoveLSM , 2018, USENIX Annual Technical Conference.

[17]  Hans-Juergen Boehm,et al.  Makalu: fast recoverable allocation of non-volatile memory , 2016, OOPSLA.

[18]  Michael L. Scott,et al.  Hodor: Intra-Process Isolation for High-Throughput Data Plane Libraries , 2019, USENIX Annual Technical Conference.

[19]  Andrea C. Arpaci-Dusseau,et al.  Physical Disentanglement in a Container-Based File System , 2014, OSDI.

[20]  Timothy Roscoe,et al.  Arrakis , 2014, OSDI.

[21]  D. Stewart,et al.  The missing memristor found , 2008, Nature.

[22]  Jialin Li,et al.  Towards High-Performance Application-Level Storage Management , 2014, HotStorage.

[23]  Michael A. Bender,et al.  The Full Path to Full-Path Indexing , 2018, FAST.

[24]  Jian Xu,et al.  NOVA-Fortis: A Fault-Tolerant Non-Volatile Main Memory File System , 2017, SOSP.

[25]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[26]  Erez Zadok,et al.  Generating Realistic Datasets for Deduplication Analysis , 2012, USENIX Annual Technical Conference.

[27]  Dawson R. Engler,et al.  Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.

[28]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[29]  Jin Xiong,et al.  HiKV: A Hybrid Index Key-Value Store for DRAM-NVM Memory Systems , 2017, USENIX Annual Technical Conference.

[30]  Tianyu Wo,et al.  SpanFS: A Scalable File System on Fast Storage Devices , 2015, USENIX Annual Technical Conference.

[31]  Jian Yang,et al.  Orion: A Distributed File System for Non-Volatile Main Memory and RDMA-Capable Networks , 2019, FAST.

[32]  Haibo Chen,et al.  Soft Updates Made Simple and Fast on Non-volatile Memory , 2017, USENIX Annual Technical Conference.

[33]  Xiao Liu,et al.  Basic Performance Measurements of the Intel Optane DC Persistent Memory Module , 2019, ArXiv.

[34]  Michael M. Swift,et al.  Aerie: flexible file-system interfaces to storage-class memory , 2014, EuroSys '14.

[35]  Bingsheng He,et al.  NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems , 2015, FAST.

[36]  Steven Swanson,et al.  Ziggurat: A Tiered File System for Non-Volatile Main Memories and Disks , 2019, FAST.

[37]  Soyeon Park,et al.  libmpk: Software Abstraction for Intel Memory Protection Keys (Intel MPK) , 2019, USENIX Annual Technical Conference.

[38]  Sam H. Noh,et al.  SLM-DB: Single-Level Key-Value Store with Persistent Memory , 2019, FAST.

[39]  Eunji Lee,et al.  Unioning of the buffer cache and journaling layers with non-volatile memory , 2013, FAST.

[40]  Changwoo Min,et al.  Understanding Manycore Scalability of File Systems , 2016, USENIX Annual Technical Conference.

[41]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.