Evolving Ext 4 for Shingled Disks

Drive-Managed SMR (Shingled Magnetic Recording) disks offer a plug-compatible higher-capacity replacement for conventional disks. For non-sequential workloads, these disks show bimodal behavior: After a short period of high throughput they enter a continuous period of low throughput. We introduce ext4-lazy1, a small change to the Linux ext4 file system that significantly improves the throughput in both modes. We present benchmarks on four different drive-managed SMR disks from two vendors, showing that ext4-lazy achieves 1.7-5.4× improvement over ext4 on a metadata-light file server benchmark. On metadata-heavy benchmarks it achieves 2-13× improvement over ext4 on drive-managed SMR disks as well as on conventional disks.

[1]  R. Card,et al.  Design and Implementation of the Second Extended Filesystem , 2001 .

[2]  Andrea C. Arpaci-Dusseau,et al.  Analysis and Evolution of Journaling File Systems , 2005, USENIX Annual Technical Conference, General Track.

[3]  M. Fatih Erden,et al.  Heat Assisted Magnetic Recording , 2008, Proceedings of the IEEE.

[4]  Vasily Tarasov,et al.  A fast and slippery slope for file systems , 2015, INFLOW '15.

[5]  Andrea C. Arpaci-Dusseau,et al.  Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions , 2017, FAST.

[6]  Chao Jin,et al.  HiSMRfs: A high performance file system for shingled storage array , 2014, 2014 30th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  George Neville-Neil,et al.  The Design and Implementation of the FreeBSD Operating System , 2014 .

[8]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[9]  Carl Staelin,et al.  An Implementation of a Log-Structured File System for UNIX , 1993, USENIX Winter.

[10]  Stephen C. Tweedie,et al.  Journaling the Linux ext2fs Filesystem , 2008 .

[11]  Zvonimir Bandic,et al.  Indirection systems for shingled-recording disk drives , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[12]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[13]  P. Desnoyers,et al.  Skylight—A Window on Shingled Disk Operation , 2015, FAST.

[14]  Joseph Pasquale,et al.  A high performance multi-structured file system design , 1991, SOSP '91.

[15]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[16]  Thomas Albrecht,et al.  Patterned Media: Nanofabrication Challenges of Future Disk Drives , 2008, Proceedings of the IEEE.

[17]  Garth A. Gibson,et al.  Shingled Magnetic Recording: Areal Density Increase Requires New Data Management , 2013, login Usenix Mag..

[18]  Carlos Maltzahn,et al.  ZEA, A Data Management Approach for SMR , 2016, HotStorage.

[19]  Garth A. Gibson,et al.  Caveat-Scriptor: Write Anywhere Shingled Disks , 2015, HotStorage.

[20]  Koji Sato,et al.  The Linux implementation of a log-structured file system , 2006, OPSR.

[21]  Kai Ren,et al.  TABLEFS: Enhancing Metadata Efficiency in the Local File System , 2013, USENIX Annual Technical Conference.

[22]  Peter Braam,et al.  The Lustre Storage Architecture , 2019, ArXiv.

[23]  Shankar Pasupathy,et al.  An analysis of latent sector errors in disk drives , 2007, SIGMETRICS '07.

[24]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[25]  Robert Tappan Morris,et al.  An Analysis of Linux Scalability to Many Cores , 2010, OSDI.

[26]  David Hung-Chang Du,et al.  Novel Address Mappings for Shingled Write Disks , 2014, HotStorage.

[27]  G. Doerk,et al.  Bit-Patterned Magnetic Recording: Theory, Media Fabrication, and Recording Performance , 2015, IEEE Transactions on Magnetics.

[28]  J. D. Coker,et al.  Data Handling Algorithms For Autonomous Shingled Magnetic Recording HDDs , 2012, IEEE Transactions on Magnetics.

[29]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[30]  Erez Zadok,et al.  Filebench: A Flexible Framework for File System Benchmarking , 2016, login Usenix Mag..

[31]  José M. García,et al.  DualFS: a new journaling file system without meta-data duplication , 2002, ICS '02.

[32]  André Brinkmann,et al.  ESB: Ext2 Split Block Device , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[33]  Andrea C. Arpaci-Dusseau,et al.  An analysis of data corruption in the storage stack , 2008, TOS.

[34]  Yifeng Zhu,et al.  High Performance and High Capacity Hybrid Shingled-Recording Disk System , 2012, 2012 IEEE International Conference on Cluster Computing.

[35]  R. S. Fabry,et al.  A fast file system for UNIX , 1984, TOCS.

[36]  Garth Gibson,et al.  Principles of Operation for Shingled Disk Devices , 2011 .

[37]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[38]  J. R. Santos,et al.  Ext 4 block and inode allocator improvements , 2010 .

[39]  Zvonimir Bandic,et al.  Shingled file system host-side management of Shingled Magnetic Recording disks , 2012, 2012 IEEE International Conference on Consumer Electronics (ICCE).

[40]  Kanad Ghose,et al.  hFS: a hybrid file system prototype for improving small file and metadata performance , 2007, EuroSys '07.

[41]  Sebastian Nowozin,et al.  Oblivious Multi-Party Machine Learning on Trusted Processors , 2016, USENIX Security Symposium.

[42]  A. Kavcic,et al.  The Feasibility of Magnetic Recording at 10 Terabits Per Square Inch on Conventional Media , 2009, IEEE Transactions on Magnetics.