Performance Analysis of NVMe SSD-Based All-flash Array Systems

In this paper, we analyze and optimize I/O latency of a petabyte scale, high performance all-flash array (AFA) system based on NVMe SSDs. A flash-based SSD itself shows relatively low and consistent latency but, in AFA systems where several tens or hundreds of SSDs are combined in a single host machine, applications often see higher and more diverged I/O latency compared with a standalone SSD. To figure out a main source of such high I/O fluctuations, we analyze end-to-end I/O latency characteristics of a real-world AFA system. We find out that suboptimal kernel policies, parameters, and configurations result in serious degradation of I/O response times, causing very long tail latency. Based on our observations, we manually reconfigure several kernel parameters and revise storage firmware to achieve consistent I/O latency. Our experimental results show that, with the finely tuned kernel for AFA systems, the mean and standard deviation of the maximum latency can be reduced by x8 and x400, respectively. The findings in this work provide useful wisdom in designing system software and operating systems – CPU schedulers need to be revised to take into account the priority of IO-bound jobs, CPU isolation, and CPU-SSD affinity, and moreover, storage housekeeping protocols like SMART should be improved to avoid long tail latency.

[1]  Steven Swanson,et al.  QuickSAN: a storage area network for fast, distributed, solid state disks , 2013, ISCA.

[2]  Dongwoo Lee,et al.  Improving performance by bridging the semantic gap between multi-queue SSD and I/O virtualization framework , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[3]  Alexander S. Szalay,et al.  Performance modeling and analysis of flash-based storage devices , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[4]  Josh Aas Understanding the Linux 2.6.8.1 CPU Scheduler , 2005 .

[5]  Alexander S. Szalay,et al.  Toward millions of file system IOPS on low-cost, commodity hardware , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[6]  Blake Caldwell Improving Block-level Efficiency with scsi-mq , 2015, ArXiv.

[7]  Frank Hady,et al.  When poll is better than interrupt , 2012, FAST.

[8]  Alexander S. Szalay,et al.  Optimize Unsynchronized Garbage Collection in an SSD Array , 2015, ArXiv.

[9]  Andrea C. Arpaci-Dusseau,et al.  Analysis of HDFS under HBase: a facebook messages case study , 2014, FAST.

[10]  Sam H. Noh,et al.  Managing Array of SSDs When the Storage Device Is No Longer the Performance Bottleneck , 2017, HotStorage.

[11]  Philippe Bonnet,et al.  Linux block IO: introducing multi-queue SSD access on multi-core systems , 2013, SYSTOR '13.

[12]  Yong Wang,et al.  SDF: software-defined flash for web-scale internet storage systems , 2014, ASPLOS.

[13]  Dutch T. Meyer,et al.  Strata: High-Performance Scalable Storage on Virtualized Non-volatile Memory , 2014, FAST 2014.

[14]  Peter Desnoyers,et al.  Empirical evaluation of NAND flash memory performance , 2010, OPSR.

[15]  Steven Swanson,et al.  Providing safe, user space access to fast, solid state disks , 2012, ASPLOS XVII.

[16]  Hyeonsang Eom,et al.  Enhancing the I/O system for virtual machines using high performance SSDs , 2014, 2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC).

[17]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[18]  Xiaodong Zhang,et al.  Understanding intrinsic characteristics and system implications of flash memory based solid state drives , 2009, SIGMETRICS '09.