On the impact of virtualization on Dropbox-like cloud file storage/synchronization services

Powered by cloud computing, Dropbox not only provides reliable file storage but also enables effective file synchronization and user collaboration. This new generation of service, beyond conventional client/server or peer-to-peer file hosting with storage only, has attracted a vast number of Internet users. It is however known that the synchronization delay of Dropbox-like systems is increasing with their expansion, often beyond the accepted level for practical collaboration. In this paper, we present an initial measurement to understand the design and performance bottleneck of the proprietary Dropbox system. Our measurement identifies the cloud servers/instances utilized by Dropbox, revealing its hybrid design with both Amazon's S3 (for storage) and Amazon's EC2 (for computation). The mix of bandwidth-intensive tasks (such as content delivery) and computation-intensive tasks (such as compare hash values for the contents) in Dropbox enables seamless collaboration and file synchronization among multiple users; yet their interference, revealed in our experiments, creates a severe bottleneck that prolongs the synchronization delay with virtual machines in the cloud, which has not seen in conventional physical machines. We thus re-model the resource provisioning problem in the Dropbox-like systems and present an interference-aware solution that smartly allocates the Dropbox tasks to different cloud instances. Evaluation results show that our solution remarkably reduces the synchronization delay for this new generation of file hosting service.

[1]  T. S. Eugene Ng,et al.  The Impact of Virtualization on Network Performance of Amazon EC2 Data Center , 2010, 2010 Proceedings IEEE INFOCOM.

[2]  John Shalf,et al.  Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[3]  Larry L. Peterson,et al.  Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors , 2007, EuroSys '07.

[4]  Erran L. Li,et al.  CloudFlex: Seamless scaling of enterprise applications into the cloud , 2011, 2011 Proceedings IEEE INFOCOM.

[5]  Bo Li,et al.  CloudMedia: When Cloud on Demand Meets Video on Demand , 2011, 2011 31st International Conference on Distributed Computing Systems.

[6]  Kang G. Shin,et al.  Performance Evaluation of Virtualization Technologies for Server Consolidation , 2007 .

[7]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[8]  Ada Gavrilovska,et al.  Cloud4Home -- Enhancing Data Services with @Home Clouds , 2011, 2011 31st International Conference on Distributed Computing Systems.

[9]  K. Shin,et al.  Performance Guarantees for Web Server End-Systems: A Control-Theoretical Approach , 2002, IEEE Trans. Parallel Distributed Syst..

[10]  Feng Wang,et al.  Measurement and utilization of customer-provided resources for cloud computing , 2012, 2012 Proceedings IEEE INFOCOM.

[11]  Bo Li,et al.  FS2You: Peer-Assisted Semipersistent Online Hosting at a Large Scale , 2010, IEEE Transactions on Parallel and Distributed Systems.

[12]  Torsten Suel,et al.  Improved file synchronization techniques for maintaining large replicated collections over slow networks , 2004, Proceedings. 20th International Conference on Data Engineering.

[13]  Geoffrey C. Fox,et al.  High Performance Parallel Computing with Clouds and Cloud Technologies , 2009, CloudComp.

[14]  Anees Shaikh,et al.  Kingfisher: Cost-aware elasticity in the cloud , 2011, 2011 Proceedings IEEE INFOCOM.

[15]  Amin Vahdat,et al.  Managing energy and server resources in hosting centers , 2001, SOSP.

[16]  Alexandru Iosup,et al.  A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing , 2009, CloudComp.

[17]  Jeanna Neefe Matthews,et al.  Quantifying the performance isolation properties of virtualization systems , 2007, ExpCS '07.

[18]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[19]  David E. Irwin,et al.  Virtual Machine Hosting for Networked Clusters: Building the Foundations for "Autonomic" Orchestration , 2006, First International Workshop on Virtualization Technology in Distributed Computing (VTDC 2006).

[20]  Alexandru Iosup,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[21]  Dan Rubenstein,et al.  Provisioning servers in the application tier for e-commerce systems , 2004, IWQoS.