Aerie: flexible file-system interfaces to storage-class memory

Storage-class memory technologies such as phase-change memory and memristors present a radically different interface to storage than existing block devices. As a result, they provide a unique opportunity to re-examine storage architectures. We find that the existing kernel-based stack of components, well suited for disks, unnecessarily limits the design and implementation of file systems for this new technology. We present Aerie, a flexible file-system architecture that exposes storage-class memory to user-mode programs so they can access files without kernel interaction. Aerie can implement a generic POSIX-like file system with performance similar to or better than a kernel implementation. The main benefit of Aerie, though, comes from enabling applications to optimize the file system interface. We demonstrate a specialized file system that reduces a hierarchical file system abstraction to a key/value store with fewer consistency guarantees but 20-109% higher performance than a kernel file system.

[1]  Winfried W. Wilcke,et al.  Storage-class memory: The next storage system technology , 2008, IBM J. Res. Dev..

[2]  Geoffrey H. Kuenning,et al.  Conquest: Better Performance Through a Disk/Persistent-RAM Hybrid File System , 2002, USENIX Annual Technical Conference, General Track.

[3]  A. L. Narasimha Reddy,et al.  SCMFS: A file system for Storage Class Memory , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[4]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[5]  J. Howard Et El,et al.  Scale and performance in a distributed file system , 1988 .

[6]  Irving L. Traiger,et al.  Granularity of Locks and Degrees of Consistency in a Shared Data Base , 1998, IFIP Working Conference on Modelling in Data Base Management Systems.

[7]  Byung-Gon Chun,et al.  Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 135 Megapipe: a New Programming Interface for Scalable Network I/o , 2022 .

[8]  Russel Sandberg,et al.  The Sun Network Filesystem: Design, Implementation and Experience , 2001 .

[9]  David Mazières,et al.  A Toolkit for User-Level File Systems , 2001, USENIX Annual Technical Conference, General Track.

[10]  Steven Swanson,et al.  Providing safe, user space access to fast, solid state disks , 2012, ASPLOS XVII.

[11]  Jim Zelenka,et al.  File server scaling with network-attached secure disks , 1997, SIGMETRICS '97.

[12]  Jacob R. Lorch,et al.  A five-year study of file-system metadata , 2007, TOS.

[13]  Ulrich Drepper,et al.  Futexes Are Tricky , 2004 .

[14]  Yiming Huai,et al.  Spin-Transfer Torque MRAM (STT-MRAM): Challenges and Prospects , 2008 .

[15]  Michael Stumm,et al.  FlexSC: Flexible System Call Scheduling with Exception-Less System Calls , 2010, OSDI.

[16]  Peter F. Corbett,et al.  The Direct Access File System , 2003, FAST.

[17]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[18]  Scott A. Brandt,et al.  HeRMES: high-performance reliable MRAM-enabled storage , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[19]  James Lau,et al.  File System Design for an NFS File Server Appliance , 1994, USENIX Winter.

[20]  Rajesh Gupta,et al.  From ARIES to MARS: transaction support for next-generation, solid-state drives , 2013, SOSP.

[21]  Michael D. Schroeder,et al.  Cooperation of mutually suspicious subsystems in a computer utility , 1972 .

[22]  Jeffrey S. Chase,et al.  Architecture support for single address space operating systems , 1992, ASPLOS V.

[23]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[24]  Sanjeev Kumar,et al.  Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.

[25]  Todor I. Mollov,et al.  Quill : Exploiting Fast Non-Volatile Memory by Transparently Bypassing the File System , 2013 .

[26]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[27]  Robert Tappan Morris,et al.  Ivy: a read/write peer-to-peer file system , 2002, OSDI '02.

[28]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[29]  Thomas R. Gross,et al.  Unified High-Performance I/O: One Stack to Rule Them All , 2013, HotOS.

[30]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[31]  Paul Barham,et al.  A fresh approach to file system quality of service , 1997, Proceedings of 7th International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV '97).

[32]  D. Stewart,et al.  The missing memristor found , 2008, Nature.

[33]  Jacob R. Lorch,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OSDI '02.

[34]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[35]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[36]  Irving L. Traiger,et al.  The notions of consistency and predicate locks in a database system , 1976, CACM.

[37]  Michael M. Swift,et al.  Efficient virtual memory for big memory servers , 2013, ISCA.

[38]  Peter J. Varman,et al.  Bridging the programming gap between persistent and volatile memory using WrAP , 2013, CF '13.

[39]  Ashok M. Joshi,et al.  Adaptive Locking Strategies in a Multi-node Data Sharing Environment , 1991, VLDB.

[40]  David L. Black,et al.  Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures , 1987, IEEE Trans. Computers.

[41]  Mahadev Satyanarayanan,et al.  Disconnected Operation in the Coda File System , 1999, Mobidata.

[42]  Andrew R. Cherenson,et al.  The Sprite network operating system , 1988, Computer.

[43]  Angela Demke Brown,et al.  Recon: Verifying file system consistency at runtime , 2012, TOS.

[44]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[45]  Vijayalakshmi Srinivasan,et al.  Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[46]  GhemawatSanjay,et al.  The Google file system , 2003 .

[47]  Robert Grimm,et al.  Application performance and flexibility on exokernel systems , 1997, SOSP.