Introduction to the Special Issue on USENIX FAST 2018

This special issue of the ACM Transactions on Storage (TOS) presents some of the highlights of the 16th USENIX Conference on File and Storage Technologies (FAST’18). Over the years, FAST has evolved into a community of researchers and practitioners working on a diverse and expanding set of research topics; the conference represents some of the latest and best work being done, and this year was no different. FAST’18 received a record number of 139 submissions on topics ranging from non-volatile memory; distributed, cloud, and data center storage; and performance and scalability to experiences with deployed systems. Of these, we selected five high-quality articles for publication in this special issue of ACM TOS. The first article, which was also selected as one of the best papers at the conference, is “Protocol-Aware Recovery for Consensus-based Storage” by Ramnatthan Alagappan, Aishwarya Ganesan, Eric Lee, Aws Albarghouthi, Vijay Chidambaram, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. Distributed storage systems are in widespread use today. The authors demonstrate how storage faults can significantly affect recovery in distributed storage systems that are based on replicated state machines, including ones in widespread use today. They then propose corruption-tolerant replication as a solution that can ensure safe recovery. The second article is “Efficient Directory Mutations in a Full-Path Indexed File System” by Yang Zhan, Alex Conway, Yizheng Jiao, Eric Knorr, Michael A. Bender, Martin Farach-Colton, William Jannen, Rob Johnson, Donald E. Porter, and Jun Yuan. BetrFS is a file system that offers dramatically faster execution times for common modern-day file-system operations. In this significant update to the design of BetrFS, the authors tackle the last stronghold of performance challenges, rename, with a new “range-rename” mechanism. The third article is “Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems” by Haryadi S. Gunawi, Riza O. Suminto, Russell Sears, Casey Golliher, Swaminathan Sundararaman, Xing Lin, Tim Emami, Weiguang Sheng, Nematollah Bidokhti, Caitie McCaffrey, Gary Grider, Parks M. Fields, Kevin Harms, Robert B. Ross, Andree Jacobson, Robert Ricci, Kirk Webb, Peter Alvaro, H. Birali Runesha, Mingzhe Hao, and Huaicheng Li. Mysterious storage faults are legends within the computer industry and increasingly more so as the scale of deployed systems grows rapidly; this article presents a lively discussion of one such class of faults, namely fail-slow, that has significant impact. The authors draw from a large-scale study based on significant documented and anecdotal evidence obtained from 101 reports of such incidents sourced from 12 different institutions. The fourth article, which was also selected as one of the best papers at the conference, is “Bringing Order to Chaos: Barrier-Enabled I/O Stack for Flash Storage” by Youjip Won, Joontaek Oh, Jaemin Jung, Gyeongyeol Choi, Seongbae Son, Jooyoung Hwang, and Sangyeun Cho. The modern storage I/O stack is extremely complex; a large contributor to this complexity is layering and the “impedance mismatch” across layers. The authors of this article revisit this well-treaded space and make an astonishingly original contribution that is not only powerful but also fundamentally simple in its ability to extract the most out of high-performance storage safely.