A Pretreatment Workflow Scheduling Approach for Big Data Applications in Multicloud Environments

The rapid development of the latest distributed computing paradigm, i.e., cloud computing, generates a highly fragmented cloud market composed of numerous cloud providers and offers tremendous parallel computing ability to handle big data problems. One of the biggest challenges in multiclouds is efficient workflow scheduling. Although the workflow scheduling problem has been studied extensively, there are still very few primal works tailored for multicloud environments. Moreover, the existing research works either fail to satisfy the quality of service (QoS) requirements, or do not consider some fundamental features of cloud computing such as heterogeneity and elasticity of computing resources. In this paper, a scheduling algorithm, which is called multiclouds partial critical paths with pretreatment (MCPCPP), for big data workflows in multiclouds is presented. This algorithm incorporates the concept of partial critical paths, and aims to minimize the execution cost of workflow while satisfying the defined deadline constraint. Our approach takes into consideration the essential characteristics of multiclouds such as the charge per time interval, various instance types from different cloud providers, as well as homogeneous intrabandwidth vs. heterogeneous interbandwidth. Various types of workflows are used for evaluation purpose and our experimental results show that the MCPCPP is promising.

[1]  R. Buyya,et al.  A budget constrained scheduling of workflow applications on utility Grids using genetic algorithms , 2006, 2006 Workshop on Workflows in Support of Large-Scale Science.

[2]  Sam Kwong,et al.  Efficient Motion and Disparity Estimation Optimization for Low Complexity Multiview Video Coding , 2015, IEEE Transactions on Broadcasting.

[3]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[4]  Mei-Hui Su,et al.  Characterization of scientific workflows , 2008, 2008 Third Workshop on Workflows in Support of Large-Scale Science.

[5]  Dick H. J. Epema,et al.  Cost-driven scheduling of grid workflows using Partial Critical Paths , 2010, 2010 11th IEEE/ACM International Conference on Grid Computing.

[6]  Jelena Mirkovic,et al.  Optimal application allocation on multiple public clouds , 2014, Comput. Networks.

[7]  Naixue Xiong,et al.  Comparative analysis of quality of service and memory usage for adaptive failure detectors in healthcare systems , 2009, IEEE Journal on Selected Areas in Communications.

[8]  Jarke J. van Wijk,et al.  Compressed Adjacency Matrices: Untangling Gene Regulatory Networks , 2012, IEEE Transactions on Visualization and Computer Graphics.

[9]  Rajkumar Buyya,et al.  Deadline Based Resource Provisioningand Scheduling Algorithm for Scientific Workflows on Clouds , 2014, IEEE Transactions on Cloud Computing.

[10]  Jin Wang,et al.  Mutual Verifiable Provable Data Auditing in Public Cloud Storage , 2015 .

[11]  Meikang Qiu,et al.  Online optimization for scheduling preemptable tasks on IaaS cloud systems , 2012, J. Parallel Distributed Comput..

[12]  Calton Pu,et al.  JTangCSB: A Cloud Service Bus for Cloud and Enterprise Application Integration , 2015, IEEE Internet Computing.

[13]  Jun Zhang,et al.  An Ant Colony Optimization Approach to a Grid Workflow Scheduling Problem With Various QoS Requirements , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  Jarek Nabrzyski,et al.  Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Rajkumar Buyya,et al.  InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services , 2010, ICA3PP.

[16]  Xingming Sun,et al.  Achieving Efficient Cloud Search Services: Multi-Keyword Ranked Search over Encrypted Cloud Data Supporting Parallel Computing , 2015, IEICE Trans. Commun..

[17]  Xue Liu,et al.  BURSE: A Bursty and Self-Similar Workload Generator for Cloud Computing , 2015, IEEE Transactions on Parallel and Distributed Systems.

[18]  Vijay Varadharajan,et al.  Security as a Service Model for Cloud Environment , 2014, IEEE Transactions on Network and Service Management.

[19]  Qian Wang,et al.  A Secure and Dynamic Multi-Keyword Ranked Search Scheme over Encrypted Cloud Data , 2016, IEEE Transactions on Parallel and Distributed Systems.

[20]  Dick H. J. Epema,et al.  Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service Clouds , 2013, Future Gener. Comput. Syst..

[21]  Naixue Xiong,et al.  A Distributed Efficient Flow Control Scheme for Multirate Multicast Networks , 2010, IEEE Transactions on Parallel and Distributed Systems.

[22]  Joaquim Sousa Pinto,et al.  Sky computing , 2011, 6th Iberian Conference on Information Systems and Technologies (CISTI 2011).

[23]  Ishfaq Ahmad,et al.  Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[24]  Naixue Xiong,et al.  Cost-Driven Scheduling for Deadline-Constrained Workflow on Multi-clouds , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[25]  Nelson Luis Saldanha da Fonseca,et al.  Scheduling in hybrid clouds , 2012, IEEE Communications Magazine.

[26]  Marty Humphrey,et al.  Auto-scaling to minimize cost and meet application deadlines in cloud workflows , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[27]  Andrei Tchernykh,et al.  Multiple Workflow Scheduling Strategies with User Run Time Estimates on a Grid , 2012, Journal of Grid Computing.

[28]  Ke Xu,et al.  Energy Management in Cross-Domain Content Delivery Networks: A Theoretical Perspective , 2014, IEEE Transactions on Network and Service Management.

[29]  Radu Prodan,et al.  A Truthful Dynamic Workflow Scheduling Mechanism for Commercial Multicloud Environments , 2013, IEEE Transactions on Parallel and Distributed Systems.

[30]  Hai Jin,et al.  DAGMap: efficient and dependable scheduling of DAG workflow job in Grid , 2010, The Journal of Supercomputing.

[31]  Tiago Ferra de Sousa,et al.  Particle Swarm based Data Mining Algorithms for classification tasks , 2004, Parallel Comput..

[32]  Shih-Fu Chang,et al.  Designing high-throughput VLC decoder. I. Concurrent VLSI architectures , 1992, IEEE Trans. Circuits Syst. Video Technol..

[33]  Giada Landi,et al.  Federation of the BonFIRE multi-cloud infrastructure with networking facilities , 2014, Comput. Networks.

[34]  Ewa Deelman,et al.  Grids and Clouds: Making Workflow Applications Work in Heterogeneous Distributed Environments , 2010, Int. J. High Perform. Comput. Appl..

[35]  Wenzhong Guo,et al.  Online optimization scheduling for scientific workflows with deadline constraint on hybrid clouds , 2016, Concurr. Comput. Pract. Exp..

[36]  Seref Sagiroglu,et al.  Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).