Provisioning data analytic workloads in a cloud

Data analytics applications are well-suited for a cloud environment. In this paper we examine the problem of provisioning resources in a public cloud to execute data analytic workloads. The goal of our provisioning method is to determine the most cost-effective configuration for a given data analytic workload. Provisioning a workload in a public cloud environment faces several challenges: it is difficult to develop accurate performance prediction models using standard methods; the space of possible configurations is very large so exact solutions cannot be efficiently determined, and the mix and intensity of query classes in a workload vary dynamically over time. We provide a formulation of the provisioning problem and then define a framework to solve the problem. Our framework contains a cost model to predict the cost of executing a workload on a configuration and a method of selecting configurations. The cost model balances resource costs and penalties from SLAs. The specific resource demands and frequencies are accounted for by queueing network models of the Virtual Machines (VMs), which are used to predict performance. We evaluate our approach experimentally using sample data analytic workloads on Amazon EC2.

[1]  Daniel J. Abadi,et al.  Data Management in the Cloud: Limitations and Opportunities , 2009, IEEE Data Eng. Bull..

[2]  Rafael Moreno-Vozmediano,et al.  Elastic management of cluster-based services in the cloud , 2009, ACDC '09.

[3]  Simson L. Garfinkel,et al.  An Evaluation of Amazon's Grid Computing Services: EC2, S3, and SQS , 2007 .

[4]  Jorge-Arnulfo Quiané-Ruiz,et al.  Runtime measurements in the cloud , 2010, Proc. VLDB Endow..

[5]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[6]  Shivnath Babu,et al.  Predicting completion times of batch query workloads using interaction-aware models and simulation , 2011, EDBT/ICDT '11.

[7]  Edwin D. Mares,et al.  On S , 1994, Stud Logica.

[8]  Borja Sotomayor,et al.  Virtual Infrastructure Management in Private and Hybrid Clouds , 2009, IEEE Internet Computing.

[9]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[10]  Jeffrey S. Chase,et al.  Automated control for elastic storage , 2010, ICAC '10.

[11]  Edward D. Lazowska,et al.  Quantitative system performance - computer system analysis using queueing network models , 1983, Int. CMG Conference.

[12]  Krzysztof Zielinski,et al.  Definition and Evaluation of Penalty Functions in SLA Management Framework , 2008, Fourth International Conference on Networking and Services (icns 2008).

[13]  José Luis Vázquez-Poletti,et al.  A Model for Efficient Onboard Actualization of an Instrumental Cyclogram for the Mars MetNet Mission on a Public Cloud Infrastructure , 2010, PARA.

[14]  Henry Li Introducing Windows Azure , 2009 .

[15]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[16]  Richard Wolski,et al.  The Eucalyptus Open-Source Cloud-Computing System , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[17]  Alex Delis,et al.  Flexible use of cloud resources through profit maximization and price discrimination , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[18]  Edward Walker,et al.  Creating personal adaptive clusters for managing scientific jobs in a distributed computing environment , 2006, 2006 IEEE Challenges of Large Applications in Distributed Environments.