z-READ: Towards Efficient and Transparent Zero-Copy Read

In cloud computing, I/O-intensive workloads can be co-located with other applications or virtual machines on a single physical machine. In this case, copy-based I/O (buffered I/O) can lead to severe performance interference to other memoryintensive workloads. It is because that the buffered I/O consumes memory bandwidth during memory copy even though it benefits from caching. To address this problem, many zero-copy I/O schemes have been proposed but none of them provides both 1) transparent copy avoidance through read/write system calls and 2) benefits of kernel-level caching at the same time. To this end, this paper presents z-READ, an efficient and transparent zero-copy read I/O scheme based on page remapping and copy-on-write techniques. In our scheme, we introduce several optimizations that minimize the overheads of page remapping by reducing the number of remote TLB shootdown.We implement z- READ prototype in memory management of Linux kernel 4.12.9. Our experimental results show that the performance of the colocated memory-intensive workloads can be negatively affected by I/O-intensive workloads in the case of copy-based I/O (up to 1.96x slowdown in-memory configurations) while z-READ incurs only up to 1.07x slowdown for the respective configuration.

[1]  Dhabaleswar K. Panda,et al.  EMP: Zero-Copy OS-Bypass NIC-Driven Gigabit Ethernet Message Passing , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[2]  Cloyce D. Spradling SPEC CPU2006 benchmark tools , 2007, CARN.

[3]  Yousef A. Khalidi,et al.  An Efficient Zero-Copy I/O Framework for UNIX , 1995 .

[4]  Larry L. Peterson,et al.  Fbufs: a high-bandwidth cross-domain transfer facility , 1994, SOSP '93.

[5]  Avi Mendelson,et al.  DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[6]  Hsiao-Keng Jerry Chu,et al.  Zero-Copy TCP in Solaris , 1996, USENIX Annual Technical Conference.

[7]  Yan Solihin,et al.  Avoiding TLB Shootdowns Through Self-Invalidating TLB Entries , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Willy Zwaenepoel,et al.  IO-Lite: a unified I/O buffering and caching system , 1999, TOCS.

[9]  Nadav Amit,et al.  Optimizing the TLB Shootdown Algorithm with Page Access Tracking , 2017, USENIX Annual Technical Conference.

[10]  Robert Ricci,et al.  To Copy or Not to Copy: Making In-Memory Databases Fast on Modern NICs , 2016, ADMS/IMDM@VLDB.

[11]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  Mohan Kumar,et al.  LATR: Lazy Translation Coherence , 2018, ASPLOS.