Small I/O requests are important for a large number of modern workloads in the data center. Traditionally, storage systems have been able to achieve low I/O rates for small I/O operations because of hard disk drive (HDD) limitations that are capable of about 100-150 IOPS (I/O operations per second) per spindle. Therefore, the host CPU processing capacity and network link throughput have been relatively abundant for providing these low rates. With new storage device technologies, such as NAND Flash Solid State Drives (SSDs) and non-volatile memory (NVM), it is becoming common to design storage systems that are able to support millions of small IOPS. At these rates, however, both server CPU and network protocol are emerging as the main bottlenecks for achieving large rates for small I/O requests. Most storage systems in datacenters deliver I/O operations over some network protocol. Although there has been extensive work in low-latency and high-throughput networks, such as Infiniband, Ethernet has dominated the datacenter. In this work we examine how networked storage protocols over raw Ethernet can achieve low, host CPU overhead and increase network link efficiency for small I/O requests. We first analyze in detail the latency and overhead of a networked storage protocol directly over Ethernet and we point out the main inefficiencies. Then, we examine how storage protocols can take advantage of context switch elimination and adaptive batching to reduce CPU and network overhead. Our results show that raw Ethernet is appropriate for supporting fast storage systems. For 4kB requests we reduce server CPU overhead by up to 45%, we improve link utilization by up to 56%, achieving more than 88% of the theoretical link throughput. Effectively, our techniques serve 56% more I/O operations over a 10Gbits/s link than a baseline protocol that does not include our optimizations at the same CPU utilization. Overall, to the best of our knowledge, this is the first work to present a system that is able to achieve 14μs host CPU overhead on both initiator and target for small networked I/Os over raw Ethernet without hardware support. In addition, our approach is able to achieve 287K 4kB IOPS out of the 315K IOPS that are theoretically possible over a 1.2GBytes/s link.
[1]
Steven Swanson,et al.
Refactor, Reduce, Recycle: Restructuring the I/O Stack for the Future of Storage
,
2013,
Computer.
[2]
Alex McDonald.
NFSv4
,
2012,
login Usenix Mag..
[3]
Michael M. Swift,et al.
Aerie: flexible file-system interfaces to storage-class memory
,
2014,
EuroSys '14.
[4]
Krishna Kant,et al.
SmartCon: SmartCon: Smart Context Switching for Fast Storage Devices
,
2015,
ACM Trans. Storage.
[5]
Pilar González-Férez,et al.
Tyche: An efficient Ethernet-based protocol for converged networked storage
,
2014,
2014 30th Symposium on Mass Storage Systems and Technologies (MSST).
[6]
David G. Andersen,et al.
Using RDMA efficiently for key-value services
,
2015,
SIGCOMM 2015.
[7]
Mendel Rosenblum,et al.
It's Time for Low Latency
,
2011,
HotOS.
[8]
Amin Vahdat,et al.
Chronos: predictable low latency for data center applications
,
2012,
SoCC '12.
[9]
Christoforos E. Kozyrakis,et al.
IX: A Protected Dataplane Operating System for High Throughput and Low Latency
,
2014,
OSDI.
[10]
Jacob R. Lorch,et al.
A five-year study of file-system metadata
,
2007,
TOS.
[11]
Mendel Rosenblum,et al.
Network Interface Design for Low Latency Request-Response Protocols
,
2013,
USENIX ATC.
[12]
Jinyang Li,et al.
Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store
,
2013,
USENIX ATC.
[13]
Thomas E. Anderson,et al.
A Comparison of File System Workloads
,
2000,
USENIX Annual Technical Conference, General Track.
[14]
Rajesh Gupta,et al.
From ARIES to MARS: transaction support for next-generation, solid-state drives
,
2013,
SOSP.
[15]
Timothy Roscoe,et al.
Arrakis
,
2014,
OSDI.
[16]
Miguel Castro,et al.
FaRM: Fast Remote Memory
,
2014,
NSDI.
[17]
Steven Swanson,et al.
Providing safe, user space access to fast, solid state disks
,
2012,
ASPLOS XVII.
[18]
Animesh Trivedi,et al.
jVerbs: ultra-low latency for data center applications
,
2013,
SoCC.
[19]
Chen Ding,et al.
Quantifying the cost of context switch
,
2007,
ExpCS '07.