A flexible analysis and prediction framework on resource usage in public clouds

In cloud computing environments, users can rent virtual machines (VMs) from cloud providers to execute their programs or provide network services. While using this kind of cloud services, one of the biggest problems for the users is to determine the proper number of VMs to complete the jobs considering both budget and time. In this paper, we propose a resource prediction framework (RPF), which can help users choose the minimum number of virtual machines to complete their jobs within a user specified time constraint. In order to verify the feasibility of RPF, we have done three case studies, namely parallel frequent pattern growth (FP-Growth), parallel K-means, and Particle Swarm Optimization (PSO). FP-growth, K-means and PSO are data intensive algorithms. These algorithms are typically executed repeatedly with different execution parameters to find the optimal results. When evaluating RPF by these algorithms in cloud environments, we have to modify them to parallel versions. The evaluation results indicate that RPF can successfully obtain the minimum number of VMs with acceptable errors. According to our case studies, the proposed RPF can be adopted by data intensive jobs by providing flexibility to both end users and cloud system providers.

[1]  Keke Chen,et al.  Towards Optimal Resource Provisioning for Running MapReduce Programs in Public Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[2]  Qing He,et al.  Parallel K-Means Clustering Based on MapReduce , 2009, CloudCom.

[3]  Jordi Guitart Fernández,et al.  Deadline constrained prediction of job resource requirements to manage high-level SLAs for SaaS cloud providers , 2010 .

[4]  Adam Meyerson,et al.  Fast and Accurate k-means For Large Datasets , 2011, NIPS.

[5]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[6]  Rajkumar Buyya,et al.  SLA-Based Resource Provisioning for Heterogeneous Workloads in a Virtualized Cloud Datacenter , 2011, ICA3PP.

[7]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[8]  Javier Alonso,et al.  Prediction of Job Resource Requirements for Deadline Schedulers to Manage High-Level SLAs on the Cloud , 2010, 2010 Ninth IEEE International Symposium on Network Computing and Applications.

[9]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[10]  Calton Pu,et al.  Intelligent management of virtualized resources for database systems in cloud environment , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[11]  Jing Luan,et al.  Data Mining and Its Applications in Higher Education , 2002 .

[12]  Sean Owen,et al.  Mahout in Action , 2011 .

[13]  Edward Y. Chang,et al.  Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.

[14]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[15]  Kevin D. Seppi,et al.  Parallel PSO using MapReduce , 2007, 2007 IEEE Congress on Evolutionary Computation.