论文信息 - z-READ: Towards Efficient and Transparent Zero-Copy Read

z-READ: Towards Efficient and Transparent Zero-Copy Read

In cloud computing, I/O-intensive workloads can be co-located with other applications or virtual machines on a single physical machine. In this case, copy-based I/O (buffered I/O) can lead to severe performance interference to other memoryintensive workloads. It is because that the buffered I/O consumes memory bandwidth during memory copy even though it benefits from caching. To address this problem, many zero-copy I/O schemes have been proposed but none of them provides both 1) transparent copy avoidance through read/write system calls and 2) benefits of kernel-level caching at the same time. To this end, this paper presents z-READ, an efficient and transparent zero-copy read I/O scheme based on page remapping and copy-on-write techniques. In our scheme, we introduce several optimizations that minimize the overheads of page remapping by reducing the number of remote TLB shootdown.We implement z- READ prototype in memory management of Linux kernel 4.12.9. Our experimental results show that the performance of the colocated memory-intensive workloads can be negatively affected by I/O-intensive workloads in the case of copy-based I/O (up to 1.96x slowdown in-memory configurations) while z-READ incurs only up to 1.07x slowdown for the respective configuration.

[1] Dhabaleswar K. Panda,et al. EMP: Zero-Copy OS-Bypass NIC-Driven Gigabit Ethernet Message Passing , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[2] Cloyce D. Spradling. SPEC CPU2006 benchmark tools , 2007, CARN.

[3] Yousef A. Khalidi,et al. An Efficient Zero-Copy I/O Framework for UNIX , 1995 .

[4] Larry L. Peterson,et al. Fbufs: a high-bandwidth cross-domain transfer facility , 1994, SOSP '93.

[5] Avi Mendelson,et al. DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[6] Hsiao-Keng Jerry Chu,et al. Zero-Copy TCP in Solaris , 1996, USENIX Annual Technical Conference.

[7] Yan Solihin,et al. Avoiding TLB Shootdowns Through Self-Invalidating TLB Entries , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8] Willy Zwaenepoel,et al. IO-Lite: a unified I/O buffering and caching system , 1999, TOCS.

[9] Nadav Amit,et al. Optimizing the TLB Shootdown Algorithm with Page Access Tracking , 2017, USENIX Annual Technical Conference.

[10] Robert Ricci,et al. To Copy or Not to Copy: Making In-Memory Databases Fast on Modern NICs , 2016, ADMS/IMDM@VLDB.

[11] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12] Mohan Kumar,et al. LATR: Lazy Translation Coherence , 2018, ASPLOS.