vRead: Efficient Data Access for Hadoop in Virtualized Clouds

With its unlimited scalability and on-demand access to computation and storage, a virtualized cloud platform is the perfect match for big data systems such as Hadoop. However, virtualization introduces a significant amount of overhead to I/O intensive applications due to device virtualization and VMs or I/O threads scheduling delay. In particular, device virtualization causes significant CPU overhead as I/O data needs to be moved across several protection boundaries. We observe that such overhead especially affects the I/O performance of the Hadoop distributed file system (HDFS). In fact, data read from an HDFS datanode VM must go through virtual devices multiple times --- incurring non-negligible virtualization overhead --- even though both client VM and datanode VM may be running on the same machine. In this paper, we propose vRead, a programmable framework which connects I/O flows from HDFS applications directly to their data. vRead enables direct "reads" to the disk images of datanode VMs from the hypervisor. By doing so, vRead can significantly avoid device virtualization overhead, resulting in improved I/O throughput as well as CPU savings for Hadoop workloads and other applications relying on HDFS.

[1]  Zhao Yu,et al.  SR-IOV Networking in Xen: Architecture, Design and Implementation , 2008, Workshop on I/O Virtualization.

[2]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[3]  Xiaolan Zhang,et al.  XenSocket: A High-Throughput Interdomain Transport for Virtual Machines , 2007, Middleware.

[4]  Denis Filimonov,et al.  VAMOS: Virtualization Aware Middleware , 2011, WIOV.

[5]  Rusty Russell,et al.  virtio: towards a de-facto standard for virtual I/O devices , 2008, OPSR.

[6]  Alan L. Cox,et al.  Optimizing network virtualization in Xen , 2006 .

[7]  Alex Landau,et al.  Efficient and Scalable Paravirtual I/O System , 2013, USENIX Annual Technical Conference.

[8]  Radu Sion,et al.  Enhancement of Xen's scheduler for MapReduce workloads , 2011, HPDC '11.

[9]  Willy Zwaenepoel,et al.  Optimizing TCP Receive Performance , 2008, USENIX ATC.

[10]  Sriram Rao,et al.  A The Quantcast File System , 2013, Proc. VLDB Endow..

[11]  J. Bottomley,et al.  VirtFS — A virtualization aware File System pass-through , 2010 .

[12]  Cong Xu,et al.  vSlicer: latency-aware virtual machine scheduling via differentiated-frequency CPU slicing , 2012, HPDC '12.

[13]  Mohsen Sharifi,et al.  ZIVM: A Zero-Copy Inter-VM Communication Mechanism for Cloud Computing , 2011, Comput. Inf. Sci..

[14]  Cong Xu,et al.  vPipe: Piped I/O Offloading for Efficient Data Movement in Virtualized Clouds , 2014, SoCC.

[15]  Renato Recio,et al.  An RDMA Protocol Specification , 2002 .

[16]  GhemawatSanjay,et al.  The Google file system , 2003 .

[17]  Muli Ben-Yehuda,et al.  IsoStack - Highly Efficient Network Processing on Dedicated Cores , 2010, USENIX Annual Technical Conference.

[18]  Irfan Ahmad,et al.  vIC: Interrupt Coalescing for Virtual Machine Storage Device IO , 2011, USENIX Annual Technical Conference.

[19]  Mohsine Eleuldj,et al.  OpenStack: Toward an Open-source Solution for Cloud Computing , 2012 .

[20]  Navjot Singh,et al.  Supporting soft real-time tasks in the xen hypervisor , 2010, VEE '10.

[21]  Willy Zwaenepoel,et al.  TwinDrivers: semi-automatic derivation of fast and safe hypervisor network drivers from guest OS drivers , 2009, ASPLOS.

[22]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[23]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[24]  Gil Neiger,et al.  Intel ® Virtualization Technology for Directed I/O , 2006 .

[25]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[26]  Alex Landau,et al.  ELI: bare-metal performance for I/O virtualization , 2012, ASPLOS XVII.

[27]  Ramana Rao Kompella,et al.  vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[28]  Dhabaleswar K. Panda,et al.  RDMA over Ethernet — A preliminary study , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[29]  Ramana Rao Kompella,et al.  Opportunistic flooding to improve TCP transmit performance in virtualized clouds , 2011, SOCC '11.