Scientific workflow scheduling in non-dedicated heterogeneous multicluster with advance reservations

Scientific workflow structured as Parallel Task Graphs (PTG) exhibits both data and task parallelism, and arises in scientific as well as in industrial domains. Efficient scheduling of such workflow on a multicluster platform has been a longstanding challenge. Most of previous work on PTG scheduling primarily focused on dedicated multicluster. In this paper, a novel scheduling algorithm known as the Moldable Task Duplication (MTD) is applied to non-dedicated heterogeneous multicluster platform with advance reservations. A novel method for the calculation of dynamic critical path that handles the availability fluctuation of multicluster and the moldability of scientific workflow’s data-parallel tasks is proposed. A moldable task duplication strategy with migration of pre-duplicated predecessor tasks is developed to fully exploit the flexibility of data-parallel tasks. Simulations spanning a broad range of scientific workflow and multicluster platform settings are performed in order to verify the proposed approach. The numerical results show that MTD can achieve better average PTG makespan than previous methods in

[1]  Christopher M. Schlick,et al.  Modelling and simulation of the task scheduling behavior in collaborative product development process , 2013, Integr. Comput. Aided Eng..

[2]  Joel H. Saltz,et al.  An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications , 2009, IEEE Transactions on Parallel and Distributed Systems.

[3]  Scott D. Kahn On the Future of Genomic Data , 2011, Science.

[4]  Cheng Wu,et al.  Concurrent and storage-aware data streaming for data processing workflows in grid environments , 2010 .

[5]  Alexandru Iosup,et al.  Performance analysis of dynamic workflow scheduling in multicluster grids , 2010, HPDC '10.

[6]  Weisong Shi,et al.  An Adaptive Rescheduling Strategy for Grid Workflow Applications , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[7]  Hojjat Adeli,et al.  Parallel Processing in Structural Engineering , 1993 .

[8]  Fang Dong,et al.  Scheduling of scientific workflow in non-dedicated heterogeneous multicluster platform , 2013, J. Syst. Softw..

[9]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[10]  Arjan J. C. van Gemund,et al.  A low-cost approach towards mixed task and data parallel scheduling , 2001, International Conference on Parallel Processing, 2001..

[11]  Nicolas Bonichon,et al.  Mixed Data-Parallel Scheduling for Distributed Continuous Integration , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[12]  Sönke Hartmann,et al.  A survey of variants and extensions of the resource-constrained project scheduling problem , 2010, Eur. J. Oper. Res..

[13]  Henri Casanova,et al.  From Heterogeneous Task Scheduling to Heterogeneous Mixed Parallel Scheduling , 2004, Euro-Par.

[14]  Fang Dong,et al.  A novel task scheduling algorithm based on dynamic critical path and effective duplication for pervasive computing environment , 2010, CMC 2010.

[15]  A. Curry,et al.  Rescue of old data offers lesson for particle physicists. , 2011, Science.

[16]  Yuping Wang,et al.  Multiobjective bilevel optimization for production-distribution planning problems using hybrid genetic algorithm , 2014, Integr. Comput. Aided Eng..

[17]  Kuo-Chan Huang,et al.  Online scheduling of workflow applications in grid environments , 2011, Future Gener. Comput. Syst..

[18]  Kuldip Singh,et al.  An improved two-step algorithm for task and data parallel scheduling in distributed memory machines , 2006, Parallel Comput..

[19]  Henri Casanova,et al.  Scheduling mixed-parallel applications with advance reservations , 2009, Cluster Computing.

[20]  Jack J. Dongarra,et al.  Scheduling workflow applications on processors with different capabilities , 2006, Future Gener. Comput. Syst..

[21]  Henri Casanova,et al.  A Comparison of Scheduling Approaches for Mixed-Parallel Applications on Heterogeneous Platforms , 2007, Sixth International Symposium on Parallel and Distributed Computing (ISPDC'07).

[22]  Tchimou N'Takpé,et al.  Critical path and area based scheduling of parallel task graphs on heterogeneous platforms , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[23]  Kwang Mong Sim,et al.  Agent-based cloud workflow execution , 2012, Integr. Comput. Aided Eng..

[24]  Gerhard J. Woeginger,et al.  Approximation Algorithms for Scheduling Malleable Tasks under Precedence Constraints , 2001, ESA.

[25]  Sascha Hunold Low-Cost Tuning of Two-Step Algorithms for Scheduling Mixed-Parallel Applications onto Homogeneous Clusters , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[26]  Fang Dong,et al.  Scheduling Parallel Task Graphs on non-dedicated heterogeneous multicluster platform with Moldable Task Duplication , 2013, Proceedings of the 2013 IEEE 17th International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[27]  Jun Shen,et al.  Verification of Composite Services with Temporal Consistency Checking and Temporal Satisfaction Estimation , 2009, WISE.

[28]  Kuldip Singh,et al.  Dealing with heterogeneity through limited duplication for scheduling precedence constrained task graphs , 2005, J. Parallel Distributed Comput..

[29]  Weisong Shi,et al.  Queue waiting time aware dynamic workflow scheduling in multicluster environments , 2010 .

[30]  Zhe Sun,et al.  A novel approach to data deduplication over the engineering-oriented cloud systems , 2013, Integr. Comput. Aided Eng..