Scheduling big data workflows in the cloud under budget constraints

Big data is fast becoming a ubiquitous term in both academia and industry and there is a strong need for new data-centric workflow tools and techniques to process and analyze large-scale complex datasets that are growing exponentially. On the other hand, the unbound resource leasing capability foreseen in the cloud facilitates data scientists to wring actionable insights from the data in a time and cost efficient manner. In the data-centric workflow environment, scheduling data processing tasks onto appropriate resources are often driven by the constraints provided by the users. Enforcing a constraint while executing the workflow in the cloud adds a new optimization challenge on how to meet the objective while satisfying the given constraint. In this paper, we propose a new Big dAta woRkflow schEduler uNder budgeT constraint known as BARENTS that supports high-performance workflow scheduling in a heterogeneous cloud computing environment with a single objective to minimize the workflow makespan under a provided budget constraint. Our case study and experiments show the competitive advantages of our proposed scheduler. The proposed BARENTS scheduler is implemented in a new release of DATA VIEW, one of the most usable big data workflow systems in the community.

[1]  Shiyong Lu,et al.  Addressing the Shimming Problem in Big Data Scientific Workflows , 2014, 2014 IEEE International Conference on Services Computing.

[2]  Shiyong Lu,et al.  SCPOR: An elastic workflow scheduling algorithm for services computing , 2011, 2011 IEEE International Conference on Service-Oriented Computing and Applications (SOCA).

[3]  Stefan Tai,et al.  What Are You Paying For? Performance Benchmarking for Infrastructure-as-a-Service Offerings , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[4]  Bingsheng He,et al.  Monetary Cost Optimizations for Hosting Workflow-as-a-Service in IaaS Clouds , 2013, IEEE Transactions on Cloud Computing.

[5]  Dick H. J. Epema,et al.  Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service Clouds , 2013, Future Gener. Comput. Syst..

[6]  Rajkumar Buyya,et al.  Meeting Deadlines of Scientific Workflows in Public Clouds with Tasks Replication , 2014, IEEE Transactions on Parallel and Distributed Systems.

[7]  Alexander Kotov,et al.  A NoSQL Data Model for Scalable Big Data Workflow Execution , 2016, 2016 IEEE International Congress on Big Data (BigData Congress).

[8]  Marta Mattoso,et al.  A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds , 2012, Journal of Grid Computing.

[9]  Rizos Sakellariou,et al.  Budget-Deadline Constrained Workflow Planning for Admission Control , 2013, Journal of Grid Computing.

[10]  Xiao Liu,et al.  A Revised Discrete Particle Swarm Optimization for Cloud Workflow Scheduling , 2010, 2010 International Conference on Computational Intelligence and Security.

[11]  Hamid Arabnejad,et al.  A Budget Constrained Scheduling Algorithm for Workflow Applications , 2014, Journal of Grid Computing.

[12]  Radu Prodan,et al.  Bi-Criteria Scheduling of Scientific Grid Workflows , 2010, IEEE Transactions on Automation Science and Engineering.

[13]  Rajkumar Buyya,et al.  A Responsive Knapsack-Based Algorithm for Resource Provisioning and Scheduling of Scientific Workflows in Clouds , 2015, 2015 44th International Conference on Parallel Processing.

[14]  Rajkumar Buyya,et al.  Scheduling scientific workflow applications with deadline and budget constraints using genetic algorithms , 2006, Sci. Program..

[15]  Li-zhen Cui,et al.  A Multiple QoS Constrained Scheduling Strategy of Multiple Workflows for Cloud Computing , 2009, 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[16]  Marios D. Dikaiakos,et al.  Scheduling Workflows with Budget Constraints , 2007, Grid 2007.

[17]  Bertrand Granado,et al.  Multi-Objective Approach for Energy-Aware Workflow Scheduling in Cloud Computing Environments , 2013, TheScientificWorldJournal.

[18]  Shiyong Lu,et al.  A System Architecture for Running Big Data Workflows in the Cloud , 2014, 2014 IEEE International Conference on Services Computing.

[19]  Tram Truong Huu,et al.  Virtual Resources Allocation for Workflow-Based Applications Distribution on a Cloud Infrastructure , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[20]  Robert G. Reynolds,et al.  TPS : A Task Placement Strategy for Big Data Workflows , 2015 .

[21]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[22]  Shiyong Lu,et al.  BDAP: A Big Data Placement Strategy for Cloud-Based Scientific Workflows , 2015, 2015 IEEE First International Conference on Big Data Computing Service and Applications.

[23]  Jarek Nabrzyski,et al.  Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[24]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[25]  Radu Prodan,et al.  Low-time complexity budget-deadline constrained workflow scheduling on heterogeneous resources , 2016, Future Gener. Comput. Syst..

[26]  Thomas Sandholm,et al.  What's inside the Cloud? An architectural map of the Cloud landscape , 2009, 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing.