Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs

Today’s ultra-low latency SSDs can deliver an I/O latency of sub-ten microseconds. With this dramatically shrunken device time, operations inside the kernel I/O stack, which were traditionally considered lightweight, are no longer a negligible portion. This motivates us to reexamine the storage I/O stack design and propose an asynchronous I/O stack (AIOS), where synchronous operations in the I/O path are replaced by asynchronous ones to overlap I/O-related CPU operations with device I/O. The asynchronous I/O stack leverages a lightweight block layer specialized for NVMe SSDs using the page cache without block I/O scheduling and merging, thereby reducing the sojourn time in the block layer. We prototype the proposed asynchronous I/O stack on the Linux kernel and evaluate it with various workloads. Synthetic FIO benchmarks demonstrate that the application-perceived I/O latency falls into single-digit microseconds for 4 KB random reads on Optane SSD, and the overall I/O latency is reduced by 15–33% across varying block sizes. This I/O latency reduction leads to a significant performance improvement of real-world applications as well: 11–44% IOPS increase on RocksDB and 15–30% throughput improvement on Filebench and OLTP workloads.

[1]  Mahmut T. Kandemir,et al.  FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs , 2018, OSDI.

[2]  Jason Flinn,et al.  Rethink the sync , 2006, OSDI '06.

[3]  Hwanju Kim,et al.  Enlightening the I/O Path: A Holistic Approach for Application Performance , 2017, FAST.

[4]  Michael M. Swift,et al.  FlashVM: Virtual Memory Management on Flash , 2010, USENIX Annual Technical Conference.

[5]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.

[6]  David G. Andersen,et al.  Using vector interfaces to deliver millions of IOPS from a networked key-value storage server , 2012, SoCC '12.

[7]  Philippe Bonnet,et al.  Linux block IO: introducing multi-queue SSD access on multi-core systems , 2013, SYSTOR '13.

[8]  Juliane Junker,et al.  Computer Organization And Design The Hardware Software Interface , 2016 .

[9]  Amber Huffman Delivering the full potential of PCIe storage , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).

[10]  Mrinmoy Ghosh,et al.  Performance analysis of NVMe SSDs and their implication on real world databases , 2015, SYSTOR.

[11]  Jin-Soo Kim,et al.  NVMeDirect: A User-space I/O Framework for Application-specific Optimization on NVMe SSDs , 2016, HotStorage.

[12]  Andrea C. Arpaci-Dusseau,et al.  IRON file systems , 2005, SOSP '05.

[13]  Youjip Won,et al.  I/O Stack Optimization for Smartphones , 2013, USENIX ATC.

[14]  Jialin Li,et al.  Towards High-Performance Application-Level Storage Management , 2014, HotStorage.

[15]  Kaushal Yadav,et al.  Enabling NVMe WRR support in Linux Block Layer , 2017, HotStorage.

[16]  Andrea C. Arpaci-Dusseau,et al.  Optimistic crash consistency , 2013, SOSP.

[17]  Steven Swanson,et al.  Providing safe, user space access to fast, solid state disks , 2012, ASPLOS XVII.

[18]  Emery D. Berger,et al.  Usenix Association 8th Usenix Symposium on Operating Systems Design and Implementation 73 Redline: First Class Support for Interactivity in Commodity Operating Systems , 2022 .

[19]  Hyeonsang Eom,et al.  Optimizing the Block I/O Subsystem for Fast Storage Devices , 2014, ACM Trans. Comput. Syst..

[20]  Josef Bacik,et al.  BTRFS: The Linux B-Tree Filesystem , 2013, TOS.

[21]  Steven Swanson,et al.  Refactor, Reduce, Recycle: Restructuring the I/O Stack for the Future of Storage , 2013, Computer.

[22]  Chuck Silvers,et al.  UBC: An Efficient Unified I/O and Memory Caching Subsystem for NetBSD , 2000, USENIX Annual Technical Conference, FREENIX Track.

[23]  Dan Tsafrir,et al.  True IOMMU Protection from DMA Attacks: When Copy is Faster than Zero Copy , 2016, ASPLOS.

[24]  신웅 OS I/O path optimizations for flash solid-state drives , 2017 .

[25]  Timothy Roscoe,et al.  Arrakis , 2014, OSDI.

[26]  Gang Cao,et al.  SPDK: A Development Kit to Build High Performance Storage Applications , 2017, 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom).

[27]  Erez Zadok,et al.  Filebench: A Flexible Framework for File System Benchmarking , 2016, login Usenix Mag..

[28]  Jaemin Jung,et al.  Barrier-Enabled IO Stack for Flash Storage , 2018, FAST.

[29]  Michael M. Swift,et al.  Aerie: flexible file-system interfaces to storage-class memory , 2014, EuroSys '14.

[30]  Jihong Kim,et al.  Improving I/O Resource Sharing of Linux Cgroup for NVMe SSDs on Multi-core Systems , 2016, HotStorage.

[31]  Sachin Katti,et al.  Reducing DRAM footprint with NVM in Facebook , 2018, EuroSys.

[32]  Heon Young Yeom,et al.  Optimizing file systems for fast storage devices , 2015, SYSTOR.

[33]  Rajesh K. Gupta,et al.  Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[34]  Jin-Soo Kim,et al.  A user-space storage I/O framework for NVMe SSDs in mobile smart devices , 2017, IEEE Transactions on Consumer Electronics.

[35]  Frank Hady,et al.  When poll is better than interrupt , 2012, FAST.

[36]  Steven Swanson,et al.  DC express: shortest latency protocol for reading phase change memory over PCI express , 2014, FAST.