H-NVMe: A hybrid framework of NVMe-based storage system in cloud computing environment

In the year of 2017, more and more datacenters have started to replace traditional SATA and SAS SSDs with NVMe SSDs due to NVMe's outstanding performance [1]. However, for historical reasons, current popular deployments of NVMe in VM-hypervisor-based platforms (such as VMware ESXi [2]) have numbers of intermediate queues along the I/O stack. As a result, performance is bottlenecked by synchronization locks in these queues, cross-VM interference induces I/O latency, and most importantly, up-to-64K-queue capability of NVMe SSDs cannot be fully utilized. In this paper, we developed a hybrid framework of NVMe-based storage system called “H-NVMe”, which provides two VM I/O stack deployment modes “Parallel Queue Mode” and “Direct Access Mode”. The first mode increases parallelism and enables lock-free operations by implementing local lightweight queues in the NVMe driver. The second mode further bypasses the entire I/O stack in the hypervisor layer and allows trusted user applications whose hosting VMDK (Virtual Machine Disk) files are attached with our customized vSphere IOFilters [3] to directly access NVMe SSDs to improve the performance isolation. This suits premium users who have higher priorities and the permission to attach IOFilter to their VMDKs. H-NVMe is implemented on VMware EXSi 6.0.0, and our evaluation results show that the proposed H-NVMe framework can significant improve throughputs and bandwidths compared to the original inbox NVMe solution.

[1]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[2]  Heon Young Yeom,et al.  Shedding Light in the Black-Box : Structural Modeling of Modern Disk Drives , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[3]  Frank Hady,et al.  When poll is better than interrupt , 2012, FAST.

[4]  Hyeonsang Eom,et al.  Enhancing the I/O system for virtual machines using high performance SSDs , 2014, 2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC).

[5]  Michael M. Swift,et al.  Aerie: flexible file-system interfaces to storage-class memory , 2014, EuroSys '14.

[6]  H. Howie Huang,et al.  Falcon: Scaling IO Performance in Multi-SSD Volumes , 2017, USENIX Annual Technical Conference.

[7]  Yufeng Wang,et al.  Improving Virtual Machine Migration via Deduplication , 2014, 2014 IEEE 11th International Conference on Mobile Ad Hoc and Sensor Systems.

[8]  Rino Micheloni,et al.  Inside Solid State Drives (Ssds) , 2012 .

[9]  Jiayin Wang,et al.  AutoReplica: Automatic data replica manager in distributed caching and data processing systems , 2016, 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC).

[10]  Teng Wang,et al.  EA2S2: An Efficient Application-Aware Storage System for Big Data Processing in Heterogeneous Clusters , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[11]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.

[12]  Hyeonsang Eom,et al.  Optimizing the Block I/O Subsystem for Fast Storage Devices , 2014, ACM Trans. Comput. Syst..

[13]  Steven Swanson,et al.  AutoTiering: Automatic data placement manager in multi-tier all-flash datacenter , 2017, 2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC).

[14]  Wang Jiayin,et al.  AutoReplica: Automatic data replica manager in distributed caching and data processing systems , 2016 .

[15]  Mohammad Arjomand,et al.  Reducing access latency of MLC PCMs through line striping , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[16]  Sang Lyul Min,et al.  Ozone (O3): An Out-of-Order Flash Memory Controller Architecture , 2011, IEEE Transactions on Computers.

[17]  Heng-Yuan Lee,et al.  A 4Mb embedded SLC resistive-RAM macro with 7.2ns read-write random-access time and 160ns MLC-access capability , 2011, 2011 IEEE International Solid-State Circuits Conference.

[18]  Suman Nath,et al.  FlashBlox: Achieving Both Performance Isolation and Uniform Lifetime for Virtualized SSDs , 2017, FAST.

[19]  신웅 OS I/O path optimizations for flash solid-state drives , 2017 .

[20]  Alexander Benlian,et al.  The effect of free sampling strategies on freemium conversion rates , 2017, Electron. Mark..

[21]  Teng Wang,et al.  AutoPath: Harnessing Parallel Execution Paths for Efficient Resource Allocation in Multi-Stage Big Data Frameworks , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[22]  Xiaoyun Zhu,et al.  Improving Flash Resource Utilization at Minimal Management Cost in Virtualized Flash-Based Storage Systems , 2017, IEEE Transactions on Cloud Computing.

[23]  Bo Sheng,et al.  EDOS: Edge Assisted Offloading System for Mobile Devices , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[24]  Jihong Kim,et al.  Improving I/O Resource Sharing of Linux Cgroup for NVMe SSDs on Multi-core Systems , 2016, HotStorage.

[25]  Rajesh K. Gupta,et al.  Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[26]  Miriam Leeser,et al.  Accelerating big data applications using lightweight virtualization framework on enterprise cloud , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[27]  Hyojun Kim,et al.  Evaluating Phase Change Memory for Enterprise Storage Systems: A Study of Caching and Tiering Approaches , 2014, TOS.

[28]  Ningfang Mi,et al.  Understanding performance of I/O intensive containerized applications for NVMe SSDs , 2016, 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC).

[29]  Minho Lee,et al.  Improving read performance by isolating multiple queues in NVMe SSDs , 2017, IMCOM.

[30]  Arun Jagatheesan,et al.  Understanding the Impact of Emerging Non-Volatile Memories on High-Performance, IO-Intensive Computing , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[31]  Mrinmoy Ghosh,et al.  A Fresh Perspective on Total Cost of Ownership Models for Flash Storage in Datacenters , 2016, 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom).

[32]  Sang-Won Lee,et al.  A survey of Flash Translation Layer , 2009, J. Syst. Archit..

[33]  Miriam Leeser,et al.  FIM: Performance Prediction for Parallel Computation in Iterative Data Processing Applications , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[34]  Teng Wang,et al.  eSplash: Efficient speculation in large scale heterogeneous computing systems , 2016, 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC).

[35]  Steven Swanson,et al.  Providing safe, user space access to fast, solid state disks , 2012, ASPLOS XVII.

[36]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[37]  Bo Sheng,et al.  GReM: Dynamic SSD resource allocation in virtualized storage systems with heterogeneous IO workloads , 2016, 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC).

[38]  Jin-Soo Kim,et al.  NVMeDirect: A User-space I/O Framework for Application-specific Optimization on NVMe SSDs , 2016, HotStorage.

[39]  H.-S. Philip Wong,et al.  Phase Change Memory , 2010, Proceedings of the IEEE.

[40]  Teng Wang,et al.  SEINA: A stealthy and effective internal attack in Hadoop systems , 2017, 2017 International Conference on Computing, Networking and Communications (ICNC).

[41]  Mark Lillibridge,et al.  Understanding the robustness of SSDS under power fault , 2013, FAST.