Optimal workload allocation model for scheduling divisible data grid applications

In many data grid applications, data can be decomposed into multiple independent sub-datasets and distributed for parallel execution and analysis. This property has been successfully employed using Divisible Load Theory (DLT), which has been proved a powerful tool for modeling divisible load problems in data-intensive grids. There are some scheduling models that have been studied but no optimal solution has been reached due to the heterogeneity of the grids. This paper proposes a new model called the Iterative DLT (IDLT) for scheduling divisible data grid applications. Recursive numerical closed form solutions are derived to find the optimal workload assigned to the processing nodes. Experimental results show that the proposed IDLT model leads to a better solution than other models (almost optimal) in terms of makespan.

[1]  Hamidah Ibrahim,et al.  A2DLT: Divisible Load Balancing Model for Scheduling Communication-Intensive Grid Applications , 2008, ICCS.

[2]  Xiao Qin,et al.  Design and analysis of a load balancing strategy in Data Grids , 2007, Future Gener. Comput. Syst..

[3]  Bharadwaj Veeravalli,et al.  Resource-Aware Distributed Scheduling Strategies for Large-Scale Computational Cluster/Grid Systems , 2007, IEEE Transactions on Parallel and Distributed Systems.

[4]  Debasish Ghose,et al.  Distributed Computation with Communication Delays: Asymptotic Performance Analysis , 1994, J. Parallel Distributed Comput..

[5]  Rajkumar Buyya,et al.  A taxonomy of Data Grids for distributed data sharing, management, and processing , 2005, CSUR.

[6]  Debasish Ghose,et al.  Divisible Load Theory: A New Paradigm for Load Scheduling in Distributed Systems , 2004, Cluster Computing.

[7]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[8]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[9]  K.Holtman,et al.  CMS Requirements for the Grid , 2001 .

[10]  Hamidah Ibrahim,et al.  Adaptive Divisible Load Model for Scheduling Data-Intensive Grid Applications , 2007, International Conference on Computational Science.

[11]  Rajkumar Buyya,et al.  Nature's heuristics for scheduling jobs on Computational Grids , 2000 .

[12]  Ming Tang,et al.  The impact of data replication on job scheduling performance in the Data Grid , 2006, Future Gener. Comput. Syst..

[13]  Jon B. Weissman,et al.  A genetic algorithm based approach for scheduling decomposable data grid applications , 2004 .

[14]  Jason Lee,et al.  A data intensive distributed computing architecture for "Grid" applications , 2000, Future Gener. Comput. Syst..

[15]  Dantong Yu,et al.  Data Intensive Grid Scheduling: Multiple Sources with Capacity Constraints , 2003 .

[16]  A. Hameurlain,et al.  Large Scale Data Management in Grid Systems: a Survey , 2008, 2008 3rd International Conference on Information and Communication Technologies: From Theory to Applications.

[17]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..