Workload models and performance evaluation of cloud storage services

Cloud storage systems are currently very popular with many companies offering services, including worldwide providers such as Dropbox, Microsoft and Google. These companies as well as providers entering the market could greatly benefit from a deep understanding of typical workload patterns their services have to face in order to develop cost-effective solutions. Yet, despite recent studies of usage and performance of these systems, the underlying processes that generate workload for the system have not been deeply studied.This paper presents a thorough investigation of the workload generated by Dropbox customers. We propose a hierarchical model that captures user sessions, file system modifications and content sharing patterns. We parameterize our model using passive measurements gathered from fourdifferent networks. Next, we use the proposed model to drive the development of CloudGen, a new synthetic workload generator that allows the simulation of the network traffic created by cloud storage services in various realistic scenarios. We validate CloudGen by comparing synthetic traces with actual data from operational networks. We then show its applicability by investigating the impact of the continuing growth in cloud storage popularity on bandwidth consumption. Our results indicate that a hypothetical 4-fold increase in both user population and content sharing could lead to 30 times more network traffic. CloudGen is a valuable tool for administrators and developers interested in engineering and deploying cloud storage services.

[1]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[2]  Daniel Stutzbach,et al.  Understanding churn in peer-to-peer networks , 2006, IMC '06.

[3]  R. D'Agostino,et al.  Goodness-of-Fit-Techniques , 1987 .

[4]  Raouf Boutaba,et al.  Characterizing Task Usage Shapes in Google Compute Clusters , 2011 .

[5]  Raúl Gracia Tinedo,et al.  Actively Measuring Personal Cloud Storage , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[6]  Yunhao Liu,et al.  Towards Network-level Efficiency for Cloud Storage Services , 2014, Internet Measurement Conference.

[7]  Jeanna Neefe Matthews,et al.  The good, the bad and the ugly of consumer cloud storage , 2010, OPSR.

[8]  Edgar R. Weippl,et al.  Dark Clouds on the Horizon: Using Cloud Storage as Attack Vector and Online Slack Space , 2011, USENIX Security Symposium.

[9]  Murat Yuksel,et al.  Workload Generation for ns Simulations of Wide Area Networks and the Internet , 2000 .

[10]  Marco Mellia,et al.  Personal cloud storage: Usage, performance and impact of terminals , 2015, 2015 IEEE 4th International Conference on Cloud Networking (CloudNet).

[11]  Marion Kee,et al.  Analysis , 2004, Machine Translation.

[12]  Andrea C. Arpaci-Dusseau,et al.  ViewBox: integrating local file systems with cloud storage services , 2014, FAST.

[13]  Xin Wang,et al.  QuickSync: Improving Synchronization Efficiency for Mobile Cloud Storage Services , 2017, IEEE Transactions on Mobile Computing.

[14]  Loretta Mastroeni,et al.  Cloud storage pricing: a comparison of current practices , 2013, HotTopiCS '13.

[15]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[16]  Guangwen Yang,et al.  Understanding Data Characteristics and Access Patterns in a Cloud Storage System , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[17]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[18]  Jerome A. Rolia,et al.  A Synthetic Workload Generation Technique for Stress Testing Session-Based Systems , 2006, IEEE Transactions on Software Engineering.

[19]  Alex Borges Vieira,et al.  Analyzing the Impact of Dropbox Content Sharing on an Academic Network , 2015, 2015 XXXIII Brazilian Symposium on Computer Networks and Distributed Systems.

[20]  Aiko Pras,et al.  Inside dropbox: understanding personal cloud storage services , 2012, Internet Measurement Conference.

[21]  Jie Xu,et al.  Analysis, Modeling and Simulation of Workload Patterns in a Large-Scale Utility Cloud , 2014, IEEE Transactions on Cloud Computing.

[22]  Alex Borges Vieira,et al.  Characterizing SopCast client behavior , 2012, Comput. Commun..

[23]  Virgílio A. F. Almeida,et al.  A hierarchical characterization of a live streaming media workload , 2006 .

[24]  Amin Vahdat,et al.  MediSyn: a synthetic streaming media service workload generator , 2003, NOSSDAV '03.

[25]  Virgílio A. F. Almeida,et al.  A hierarchical characterization of a live streaming media workload , 2006, TNET.

[26]  Raimund Schatz,et al.  Quality of Experience in Cloud services: Survey and measurements , 2014, Comput. Networks.

[27]  Michael Mitzenmacher,et al.  Dynamic Models for File Sizes and Double Pareto Distributions , 2004, Internet Math..

[28]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[29]  Raj Jain,et al.  The Art of Computer Systems Performance Analysis : Tech-niques for Experimental Design , 1991 .

[30]  Dario Rossi,et al.  Experiences of Internet traffic monitoring with tstat , 2011, IEEE Network.

[31]  Carey L. Williamson,et al.  ProWGen: a synthetic workload generation tool for simulation evaluation of web proxy caches , 2002, Comput. Networks.

[32]  Chita R. Das,et al.  Towards characterizing cloud backend workloads: insights from Google compute clusters , 2010, PERV.

[33]  Raimund Schatz,et al.  Need for speed? On quality of experience for file storage services , 2013 .

[34]  Christina Delimitrou,et al.  Decoupling datacenter studies from access to large-scale applications: A modeling approach for storage workloads , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[35]  Tobias Hoßfeld,et al.  Need for Speed ? On Quality of Experience for Cloud-based File Storage Services , 2013 .

[36]  Paul Barford,et al.  Generating representative Web workloads for network and server performance evaluation , 1998, SIGMETRICS '98/PERFORMANCE '98.

[37]  Joseph Mendola,et al.  From the Good , 2014 .

[38]  Hong Jiang,et al.  Exploiting Workload Characteristics and Service Diversity to Improve the Availability of Cloud Storage Systems , 2016, IEEE Transactions on Parallel and Distributed Systems.

[39]  Christina Delimitrou,et al.  Accurate Modeling and Generation of Storage I / O for Datacenter Workloads , 2011 .

[40]  Marco Mellia,et al.  DNS to the rescue: discerning content and services in a tangled web , 2012, IMC '12.

[41]  Feng Wang,et al.  On the impact of virtualization on Dropbox-like cloud file storage/synchronization services , 2012, 2012 IEEE 20th International Workshop on Quality of Service.

[42]  Marco Mellia,et al.  Personal Cloud Storage Benchmarks and Comparison , 2017, IEEE Transactions on Cloud Computing.

[43]  Scott Hahn,et al.  Client-aware cloud storage , 2014, 2014 30th Symposium on Mass Storage Systems and Technologies (MSST).

[44]  Alex Borges Vieira,et al.  Modeling the Dropbox client behavior , 2014, 2014 IEEE International Conference on Communications (ICC).

[45]  K. Goh,et al.  Universal behavior of load distribution in scale-free networks. , 2001, Physical review letters.

[46]  David Mosberger,et al.  httperf—a tool for measuring web server performance , 1998, PERV.

[47]  Azer Bestavros,et al.  GISMO: a Generator of Internet Streaming Media Objects and workloads , 2001, PERV.

[48]  Cristina L. Abad,et al.  Generating request streams on Big Data using clustered renewal processes , 2013, Perform. Evaluation.