Decentralised workflow scheduling in volunteer computing systems

Volunteer computing systems exploiting large amounts of geographically dispersed resources on the Internet for solving complex scientific problems. However, scheduling scientific workflows in a fully decentralised way and low overhead is a challenging task in these environments. To counter this challenge, this paper presents a fully decentralised proximity-aware workflow-scheduling policy for these environments. The proposed scheduling consists of three phases. In the first phase, each workflow application is partitioned into sub-workflows in order to minimise data dependencies among them. The second phase of the workflow-scheduling algorithm finds some resources to execute each sub-workflow. These resources are selected based on Quality of Service (QoS) constraints of the workflow, load balancing and proximity of resources. Each workflow can have QoS constraints in terms of minimum CPU speed and minimum RAM or hard disk requirements. In the third phase, sub-workflows will be executed on each resource based on local scheduling algorithm to minimise the partial makespan. The proposed scheduling policy focuses on the reduction of communication overhead to improve the performance of I/O-intensive and data-intensive workflows. Simulation results show that the proposed workflow-scheduling policy improves the average response time of scientific workflows up to 53.6% under a moderate load.

[1]  Cho-Li Wang,et al.  Dual-Phase Just-in-Time Workflow Scheduling in P2P Grid Systems , 2010, 2010 39th International Conference on Parallel Processing.

[2]  Mei-Hui Su,et al.  Characterization of scientific workflows , 2008, 2008 Third Workshop on Workflows in Support of Large-Scale Science.

[3]  Ibrahim Matta,et al.  BRITE: an approach to universal topology generation , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[4]  Péter Kacsuk,et al.  How to Make BOINC-Based Desktop Grids Even More Popular? , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[5]  Unai Arronategui,et al.  Distributed Scheduler of Workflows with Deadlines in a P2P Desktop Grid , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[6]  R. M. Mattheyses,et al.  A Linear-Time Heuristic for Improving Network Partitions , 1982, 19th Design Automation Conference.

[7]  G. Bruce Berriman,et al.  Scientific workflow applications on Amazon EC2 , 2010, 2009 5th IEEE International Conference on E-Science Workshops.

[8]  M. Livny,et al.  High-Throughput, Kingdom-Wide Prediction and Annotation of Bacterial Non-Coding RNAs , 2008, PloS one.

[9]  Rajkumar Buyya,et al.  A proximity-aware load balancing in peer-to-peer-based volunteer computing systems , 2012, The Journal of Supercomputing.

[10]  Charu C. Aggarwal,et al.  A Survey of Clustering Algorithms for Graph Data , 2010, Managing and Mining Graph Data.

[11]  Y. Charlie Hu,et al.  A Self-Organizing Flock of Condors , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[12]  Rajkumar Buyya,et al.  Aneka: Next-Generation Enterprise Grid Platform for e-Science and e-Business Applications , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[13]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[14]  Mohamed Jemni,et al.  BonjourGrid: Orchestration of multi-instances of grid middlewares on institutional Desktop Grids , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[15]  Laura A. Sanchis,et al.  Multiple-Way Network Partitioning , 1989, IEEE Trans. Computers.

[16]  Sajal K. Das,et al.  Graph partitioning for parallel applications in heterogeneous Grid environments , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[17]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[18]  Miron Livny,et al.  A worldwide flock of Condors: Load sharing among workstation clusters , 1996, Future Gener. Comput. Syst..

[19]  Albert Y. Zomaya,et al.  Robust task scheduling for volunteer computing systems , 2010, The Journal of Supercomputing.

[20]  Mohamed Jemni,et al.  A decentralized and fault‐tolerant Desktop Grid system for distributed applications , 2010, Concurr. Comput. Pract. Exp..

[21]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[22]  Laura A. Sanchis,et al.  Multiple-Way Network Partitioning with Different Cost Functions , 1993, IEEE Trans. Computers.

[23]  SangKeun Lee,et al.  Self-Gridron: Reliable, Autonomous, and Fully Decentralized Desktop Grid Computing System based on Neural Overlay Network , 2008, PDPTA.

[24]  Mohamed Jemni,et al.  Controlling processing usage at user level: a way to make resource sharing more flexible , 2010 .

[25]  Rajkumar Buyya,et al.  Cooperative and decentralized workflow scheduling in global grids , 2010, Future Gener. Comput. Syst..

[26]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[27]  Andrew A. Chien,et al.  Entropia: architecture and performance of an enterprise desktop grid system , 2003, J. Parallel Distributed Comput..

[28]  Daniel S. Katz,et al.  Montage: a grid-enabled engine for delivering custom science-grade mosaics on demand , 2004, SPIE Astronomical Telescopes + Instrumentation.

[29]  Ian J. Taylor,et al.  Attic: A Case Study for Distributing Data in BOINC Projects , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[30]  Péter Kacsuk,et al.  SZTAKI Desktop Grid (SZDG): A Flexible and Scalable Desktop Grid System , 2009, Journal of Grid Computing.

[31]  Ewa Deelman,et al.  Integration of Workflow Partitioning and Resource Provisioning , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[32]  Cosimo Anglano,et al.  The ShareGrid Peer-to-Peer Desktop Grid: Infrastructure, Applications, and Performance Evaluation , 2010, Journal of Grid Computing.

[33]  Nazareno Andrade,et al.  Labs of the World, Unite!!! , 2006, Journal of Grid Computing.

[34]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[35]  Geoffrey C. Fox,et al.  Biomedical Case Studies in Data Intensive Computing , 2009, CloudCom.

[36]  Li Zhao,et al.  Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[37]  Carey L. Williamson,et al.  A comparative analysis of web and peer-to-peer traffic , 2008, WWW.

[38]  Gilles Fedak Recent Advances and Research Challenges in Desktop Grid and Volunteer Computing , 2009, CoreGRID@Euro-Par.

[39]  Bobby Bhattacharjee,et al.  Trade-offs in matching jobs and balancing load for distributed desktop grids , 2008, Future Gener. Comput. Syst..

[40]  Rajkumar Buyya,et al.  CycloidGrid: A proximity-aware P2P-based resource discovery architecture in volunteer computing systems , 2013, Future Gener. Comput. Syst..

[41]  Alexandru Iosup,et al.  The performance of bags-of-tasks in large-scale distributed systems , 2008, HPDC '08.

[42]  David P. Anderson,et al.  Celebrating Diversity in Volunteer Computing , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[43]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[44]  David P. Anderson Volunteer computing , 2010, CROS.

[45]  Gargi Dasgupta,et al.  Distributed and Adaptive Execution of Condor DAGMan Workflows , 2010, SEKE.

[46]  David P. Anderson,et al.  SETI@home: an experiment in public-resource computing , 2002, CACM.

[47]  Thomas Hérault,et al.  Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid , 2005, Future Gener. Comput. Syst..