NVMe-over-fabrics performance characterization and the path to low-overhead flash disaggregation

Storage disaggregation separates compute and storage to different nodes in order to allow for independent resource scaling and thus, better hardware resource utilization. While disaggregation of hard-drives storage is a common practice, NVMe-SSD (i.e., PCIe-based SSD) disaggregation is considered more challenging. This is because SSDs are significantly faster than hard drives, so the latency overheads (due to both network and CPU processing) as well as the extra compute cycles needed for the offloading stack become much more pronounced. In this work we characterize the overheads of NVMe-SSD disaggregation. We show that NVMe-over-Fabrics (NVMf) - a recently-released remote storage protocol specification - reduces the overheads of remote access to a bare minimum, thus greatly increasing the cost-efficiency of Flash disaggregation. Specifically, while recent work showed that SSD storage disaggregation via iSCSI degrades application-level throughput by 20%, we report on negligible performance degradation with NVMf - both when using stress-tests as well as with a more-realistic KV-store workload.

[1]  Scott Shenker,et al.  Network support for resource disaggregation in next-generation datacenters , 2013, HotNets.

[2]  Dhabaleswar K. Panda,et al.  High Performance RDMA-Based MPI Implementation over InfiniBand , 2003, ICS '03.

[3]  Renato Recio,et al.  An RDMA Protocol Specification , 2002 .

[4]  Charles Loboz,et al.  Cloud Resource Usage—Heavy Tailed Distributions Invalidating Traditional Capacity Planning Models , 2012, Journal of Grid Computing.

[5]  Dhabaleswar K. Panda,et al.  High-Performance Design of Hadoop RPC with RDMA over InfiniBand , 2013, 2013 42nd International Conference on Parallel Processing.

[6]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[7]  Arkady Kanevsky,et al.  Remote Direct Memory Access over the Converged Enhanced Ethernet Fabric: Evaluating the Options , 2009, 2009 17th IEEE Symposium on High Performance Interconnects.

[8]  Hemal Shah,et al.  Internet Small Computer System Interface (iSCSI) Extensions for Remote Direct Memory Access (RDMA) , 2007, RFC.

[9]  Frank Hady,et al.  When poll is better than interrupt , 2012, FAST.

[10]  Yong Wang,et al.  SDF: software-defined flash for web-scale internet storage systems , 2014, ASPLOS.

[11]  Dutch T. Meyer,et al.  Strata: High-Performance Scalable Storage on Virtualized Non-volatile Memory , 2014, FAST 2014.

[12]  Asim Kadav,et al.  Blizzard: Fast, Cloud-scale Block Storage for Cloud-oblivious Applications , 2014, NSDI.

[13]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition , 2013, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition.

[14]  Arif Merchant,et al.  Flash Reliability in Production: The Expected and the Unexpected , 2016, FAST.

[15]  Mrinmoy Ghosh,et al.  Performance analysis of NVMe SSDs and their implication on real world databases , 2015, SYSTOR.

[16]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[17]  Christoforos E. Kozyrakis,et al.  Flash storage disaggregation , 2016, EuroSys.

[18]  Gu-Yeon Wei,et al.  Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).