Time and Cost-Driven Scheduling of Data Parallel Tasks in Grid Workflows

The necessity of identifying suitable computing resources to solve a scientific or engineering problem in a Grid environment requires more and more sophisticated resource management systems: 1) strategies and technologies should be able to master the complexity of modern large-scale networks and computing facilities and 2) the convergence of Grid computing toward the service-oriented approach is fostering a new vision where economic aspects represent central issues to burst the adoption of computing as a utility. In this context, the design and execution of data and compute-intensive applications are often simplified by the adoption of model-driven approaches based on workflows. The execution of Grid workflows can leverage on meta-scheduling systems to automatically and transparently allocate tasks to resources that ensure the fulfillment of functional requirements and quality-of-service (QoS) constraints, specified by the user. This paper presents a time and cost-constrained scheduling strategy that, according to the data parallelism pattern, is able to deploy scientific and business workflow tasks (or other kinds of application tasks) on pools of resources selected with the aim of minimizing the overall execution time. The strategy was implemented as a plug-in in a matchmaker for Grid services and its validity and accuracy were experimentally proved on a real testbed leveraging a framework for the deployment of data parallel tasks. The results show that the tasks deployment is effective and accurate and pave the way for using the Internet as a utility computing facility.

[1]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[2]  Eugenio Zimeo,et al.  Activity pre-scheduling for run-time optimization of grid workflows , 2008, J. Syst. Archit..

[3]  Henri Casanova,et al.  From Heterogeneous Task Scheduling to Heterogeneous Mixed Parallel Scheduling , 2004, Euro-Par.

[4]  Rajkumar Buyya,et al.  A taxonomy and survey of grid resource management systems for distributed computing , 2002, Softw. Pract. Exp..

[5]  Denis Caromel,et al.  Programming, Composing, Deploying for the Grid , 2006, Grid Computing: Software Environments and Tools.

[6]  Rajkumar Buyya,et al.  A Taxonomy of Workflow Management Systems for Grid Computing , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[7]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[8]  Thomas G. Robertazzi,et al.  Load sequencing for a parallel processing utility , 2004, J. Parallel Distributed Comput..

[9]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[10]  Debasish Ghose,et al.  Divisible Load Theory: A New Paradigm for Load Scheduling in Distributed Systems , 2004, Cluster Computing.

[11]  Henri Casanova,et al.  A realistic network/application model for scheduling divisible loads on large-scale platforms , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[12]  Ladislau Bölöni,et al.  A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems , 2001, J. Parallel Distributed Comput..

[13]  Francisco Brasileiro,et al.  Grid Computing for Bag of Tasks Applications , 2003 .

[14]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2004, Distributed and Parallel Databases.

[15]  Rajkumar Buyya,et al.  A time optimization algorithm for scheduling bag-of-task applications in auction-based proportional share systems , 2005, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05).

[16]  David Abramson,et al.  Economic models for resource management and scheduling in Grid computing , 2002, Concurr. Comput. Pract. Exp..

[17]  Radu Prodan,et al.  ASKALON: a tool set for cluster and Grid computing , 2005, Concurr. Pract. Exp..

[18]  G. Alonso,et al.  Parallel computing patterns for Grid workflows , 2006, 2006 Workshop on Workflows in Support of Large-Scale Science.

[19]  Spyros Sioutas,et al.  Application service provision through the grid: business models and architectures , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[20]  D. Hollingsworth The Workflow Reference Model: 10 Years On , 2004 .

[21]  Hong Linh Truong,et al.  ASKALON: a tool set for cluster and Grid computing: Research Articles , 2005 .

[22]  Franco Frattolillo,et al.  Programming metasystems with active objects , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[23]  R. Buyya,et al.  A budget constrained scheduling of workflow applications on utility Grids using genetic algorithms , 2006, 2006 Workshop on Workflows in Support of Large-Scale Science.

[24]  Eugenio Zimeo,et al.  A Transparent Framework for Hierarchical Master-Slave Grid Computing , 2006, Euro-Par Workshops.

[25]  Arjan J. C. van Gemund,et al.  CPR: mixed task and data parallel scheduling for distributed systems , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[26]  Eugenio Zimeo,et al.  An economy-driven mapping heuristic for hierarchical master-slave applications in grid systems , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[27]  Radu Prodan,et al.  Taxonomies of the Multi-Criteria Grid Workflow Scheduling Problem , 2008 .

[28]  Henri Casanova,et al.  Scheduling distributed applications: the SimGrid simulation framework , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[29]  Eugenio Zimeo,et al.  More Semantics in QoS Matching , 2007, IEEE International Conference on Service-Oriented Computing and Applications (SOCA '07).

[30]  Bharadwaj Veeravalli,et al.  On the Influence of Start-Up Costs in Scheduling Divisible Loads on Bus Networks , 2000, IEEE Trans. Parallel Distributed Syst..

[31]  David Abramson,et al.  High performance parametric modeling with Nimrod/G: killer application for the global grid? , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[32]  Eugenio Zimeo,et al.  Structure Matching for Enhancing UDDI Queries Results , 2007, IEEE International Conference on Service-Oriented Computing and Applications (SOCA '07).

[33]  Eugenio Zimeo,et al.  A Framework for QoS-based Resource Brokering in Grid Computing , 2007, WEWST.

[34]  Francisco Vilar Brasileiro,et al.  Running Bag-of-Tasks applications on computational grids: the MyGrid approach , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[35]  Alfredo Vaccaro,et al.  Pervasive grid for large-scale power systems contingency analysis , 2006, IEEE Transactions on Industrial Informatics.

[36]  Henri Casanova,et al.  An Evaluation of Job Scheduling Strategies for Divisible Loads on Grid Platforms , 2006 .

[37]  Ali Afzal,et al.  Workflow Enactment in ICENI , 2004 .

[38]  Daniel A. Menascé,et al.  Quality of Service Aspects and Metrics In Grid Computing , 2004, Int. CMG Conference.

[39]  Rajkumar Buyya,et al.  Critical-path and priority based algorithms for scheduling workflows with parameter sweep tasks on global grids , 2005, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05).