SLA-Aware Scheduling of Map-Reduce Applications on Public Clouds

The recent need of processing BigData has led to the development of several Map-Reduce applications for efficient large scale processing. Due to on-demand availability of large computing resources, Public Clouds have become a natural host of these Map-Reduce applications. In this case, users need to decide which resources they need to rent to run their MapReduce cluster other than deployment or scheduling of mapreduce tasks itself. This is not a trivial task particularly when users may have performance constraints such as deadline and have several Cloud product types to choose with intention of not spending much money. Even though there are several existing scheduling systems, however most of them are not developed to manage the scheduling of Map-Reduce applications. That is, they do not consider things like the number of map and reduce tasks and slots per VM. This paper proposes a novel greedy scheduling algorithm (MASA) that considers the users constraints in order to minimize cost of renting Cloud resources while considering the user's budget and deadline constraints. The simulation results show 25-60% reduction cost in comparison to current methods by using our proposed algorithm.

[1]  James Murty,et al.  Programming amazon web services , 2008 .

[2]  Zhenhuan Gong,et al.  PRESS: PRedictive Elastic ReSource Scaling for cloud systems , 2010, 2010 International Conference on Network and Service Management.

[3]  Depei Qian,et al.  MapReduce Workload Modeling with Statistical Approach , 2011, Journal of Grid Computing.

[4]  Himabindu Pucha,et al.  Towards Optimizing Hadoop Provisioning in the Cloud , 2009, HotCloud.

[5]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[6]  John M. Wilson,et al.  An Algorithm for the Generalized Assignment Problem with Special Ordered Sets , 2005, J. Heuristics.

[7]  Yang Wang,et al.  Budget-Driven Scheduling Algorithms for Batches of MapReduce Jobs in Heterogeneous Clouds , 2014, IEEE Transactions on Cloud Computing.

[8]  José A. B. Fortes,et al.  On the Use of Machine Learning to Predict the Time and Resources Consumed by Applications , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[9]  Xiaohui Gu,et al.  AGILE: Elastic Distributed Resource Scaling for Infrastructure-as-a-Service , 2013, ICAC.

[10]  James Murty,et al.  Programming Amazon web services - S3, EC2, SQS, FPS, and SimpleDB: outsource your infrastructure , 2008 .

[11]  Herodotos Herodotou,et al.  A What-if Engine for Cost-based MapReduce Optimization , 2013, IEEE Data Eng. Bull..

[12]  Kemafor Anyanwu,et al.  Scheduling Hadoop Jobs to Meet Deadlines , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[13]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[14]  Prem Prakash Jayaraman,et al.  IOTSim: A simulator for analysing IoT applications , 2017, J. Syst. Archit..

[15]  Malgorzata Steinder,et al.  Performance-driven task co-scheduling for MapReduce environments , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.

[16]  Roy H. Campbell,et al.  Two Sides of a Coin: Optimizing the Schedule of MapReduce Jobs to Minimize Their Makespan and Improve Cluster Performance , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Randy H. Katz,et al.  Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud , 2011, HotCloud.

[19]  Rajkumar Buyya,et al.  Big Data computing and clouds: Trends and future directions , 2013, J. Parallel Distributed Comput..

[20]  Tejaswi Redkar,et al.  Windows Azure Platform , 2010 .