Max: A Multicore-Accelerated File System for Flash Storage

The bandwidth of flash storage has been surging in recent years. Employing multicores to fully unleash its abundant bandwidth becomes a necessary step towards building high performance storage systems. This paper presents the design and implementation of Max, a multicore-friendly logstructured file system (LFS) for flash storage. With three main techniques, Max systematically improves the scalability of LFS while retaining the flash-friendly design. First, we propose a new reader-writer semaphore to scale the user I/Os with negligible impact on the internal operations of LFS. Second, we introduce file cell to scale the access to in-memory index and cache while delivering concurrencyand flash-friendly on-disk layout. Third, to fully exploit the flash parallelism, we advance the single log design with runtime-independent log partitions, and delay the ordering and consistency guarantees to crash recovery. We implement Max based on the F2FS in the Linux kernel. Evaluations show that Max significantly improves scalability, and achieves an order of magnitude higher throughput than existing Linux file systems.

[1]  Paul E. McKenney Using RCU in the Linux 2.5 Kernel , 2003 .

[2]  Tao Zou,et al.  Tango: distributed data structures over a shared log , 2013, SOSP.

[3]  Jian Xu,et al.  NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories , 2016, FAST.

[4]  Koji Sato,et al.  The Linux implementation of a log-structured file system , 2006, OPSR.

[5]  M. Frans Kaashoek,et al.  Scaling a file system to many cores using an operation log , 2017, SOSP.

[6]  Changwoo Min,et al.  Understanding Manycore Scalability of File Systems , 2016, USENIX Annual Technical Conference.

[7]  Victor Luchangco,et al.  Scalable reader-writer locks , 2009, SPAA '09.

[8]  Nir Shavit,et al.  Read-log-update: a lightweight synchronization mechanism for concurrent programming , 2015, SOSP.

[9]  Haibo Chen,et al.  Scalable Read-mostly Synchronization Using Passive Reader-Writer Locks , 2014, USENIX Annual Technical Conference.

[10]  David Dice,et al.  BRAVO - Biased Locking for Reader-Writer Locks , 2018, USENIX Annual Technical Conference.

[11]  David Dice,et al.  Compact NUMA-aware Locks , 2018, EuroSys.

[12]  Dongkun Shin,et al.  iJournaling: Fine-Grained Journaling for Improving the Latency of Fsync System Call , 2017, USENIX Annual Technical Conference.

[13]  Nir Shavit,et al.  NUMA-aware reader-writer locks , 2013, PPoPP '13.

[14]  Robert Tappan Morris,et al.  An Analysis of Linux Scalability to Many Cores , 2010, OSDI.

[15]  Taesoo Kim,et al.  Scalable and practical locking with shuffling , 2019, SOSP.

[16]  Philippe Bonnet,et al.  Linux block IO: introducing multi-queue SSD access on multi-core systems , 2013, SYSTOR '13.

[17]  Matias Bjørling,et al.  From Open-Channel SSDs to Zoned Namespaces , 2019 .

[18]  Joo Young Hwang,et al.  F2FS: A New File System for Flash Storage , 2015, FAST.

[19]  Sang-Won Lee,et al.  SFS: random write considered harmful in solid state drives , 2012, FAST.

[20]  Austin T. Clements,et al.  The scalable commutativity rule: designing scalable software for multicore processors , 2013, SOSP.

[21]  Thomas E. Anderson,et al.  Strata: A Cross Media File System , 2017, SOSP.

[22]  Willy Zwaenepoel,et al.  KVell: the design and implementation of a fast persistent key-value store , 2019, SOSP.

[23]  David Engel,et al.  The Design And Implementation Of A Log Structured File System , 2016 .

[24]  Changwoo Min,et al.  MV-RLU: Scaling Read-Log-Update with Multi-Versioning , 2019, ASPLOS.

[25]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[26]  Youyou Lu,et al.  Write Dependency Disentanglement with HORAE , 2020, OSDI.

[27]  Hiroshi Motoda,et al.  A Flash-Memory Based File System , 1995, USENIX.

[28]  Changwoo Min,et al.  Scalable NUMA-aware Blocking Synchronization Primitives , 2017, USENIX Annual Technical Conference.

[29]  Tianyu Wo,et al.  SpanFS: A Scalable File System on Fast Storage Devices , 2015, USENIX Annual Technical Conference.

[30]  Andrea C. Arpaci-Dusseau,et al.  Physical Disentanglement in a Container-Based File System , 2014, OSDI.

[31]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[32]  Virendra J. Marathe,et al.  Lock cohorting: a general technique for designing NUMA locks , 2012, PPoPP '12.

[33]  Heon Young Yeom,et al.  High-Performance Transaction Processing in Journaling File Systems , 2018, FAST.