LVD: lean virtual disks

In this work, we present Lean Virtual Disks (LVD), a new virtual disk format for virtualized servers. LVD transparently consolidates duplicate blocks across virtual machines to create a lean disk image, leading to a merged datapath for all virtual machines. This merged datapath allows efficient storage usage, reduction in disk I/O (read/write) by eliminating I/O for same content across VMs and efficient host cache utilization. LVD is motivated by clouds, where VMs are created from golden masters and use standardized middleware and management tools leading to high content similarity. We implement LVD as an extension of QCow2 and study its ability to improve common data center system management activities as well as improving application performance of popular I/O benchmark workloads. We observed that LVD reduced disk space and disk I/O by 70%, making applications run faster by 25% on an average.

[1]  Ethan L. Miller,et al.  The effectiveness of deduplication on virtual machine disk images , 2009, SYSTOR '09.

[2]  Ludmila Cherkasova,et al.  Measuring CPU Overhead for I/O Processing in the Xen Virtual Machine Monitor , 2005, USENIX ATC, General Track.

[3]  Zhe Zhang,et al.  VMAR: Optimizing I/O Performance and Resource Utilization in the Cloud , 2013, Middleware.

[4]  Irfan Ahmad,et al.  Decentralized Deduplication in SAN Cluster File Systems , 2009, USENIX Annual Technical Conference.

[5]  George Varghese,et al.  Difference engine , 2010, OSDI.

[6]  Dutch T. Meyer,et al.  A study of practical deduplication , 2011, TOS.

[7]  Mark Lillibridge,et al.  Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality , 2009, FAST.

[8]  Steven Hand,et al.  Satori: Enlightened Page Sharing , 2009, USENIX Annual Technical Conference.

[9]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[10]  Tal Garfinkel,et al.  Virtualization Aware File Systems: Getting Beyond the Limitations of Virtual Disks , 2006, NSDI.

[11]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[12]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[13]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[14]  Michael Dahlin,et al.  TAPER: tiered approach for eliminating redundancy in replica synchronization , 2005, FAST'05.

[15]  Raju Rangaswami,et al.  I/O Deduplication: Utilizing content similarity to improve I/O performance , 2010, TOS.

[16]  Butler W. Lampson,et al.  On-line data compression in a log-structured file system , 1992, ASPLOS V.

[17]  Timothy Bisson,et al.  iDedup: latency-aware, inline data deduplication for primary storage , 2012, FAST.

[18]  Chunyi Peng,et al.  An empirical analysis of similarity in virtual machine images , 2011, Middleware '11.

[19]  Fred Douglis,et al.  Redundancy Elimination Within Large Collections of Files , 2004, USENIX Annual Technical Conference, General Track.