Intelligent Scheduling and Replication in Datagrids: a Synergistic Approach

In large-scale data-intensive applications data plays a pivotal role in the execution of these applications, and data transfer is the primary cause of job execution delay. In environments such as the data grids with the need to execute jobs requiring large amounts of data, a smart collaborative environment between the scheduling and data management services to achieve a synergistic effect on the performance of the grid becomes essential. This paper presents an intelligent data grid framework where job scheduling and data and replica management are coupled to provide an integrated environment for efficient access to data and job scheduling. The data management service predicts and estimates the appropriate locations of replica and proactively replicates the datasets in these locations while the intelligent Tabu Search based scheduler incorporating information about the datasets dispatches the jobs to the sites guaranteeing minimum job execution time and better overall system utilization. Evaluation of the framework shows significant improvement in the performance of the grid and job execution time.

[1]  Kurt Stockinger,et al.  Simulation of Dynamic Grid Replication Strategies in OptorSim , 2002, GRID.

[2]  Antoine Vernois,et al.  Simultaneous Scheduling of Replication and Computation for Data-Intensive Applications on the Grid , 2005, Journal of Grid Computing.

[3]  Fred Glover,et al.  Tabu Search: A Tutorial , 1990 .

[4]  Kavitha Ranganathan,et al.  Evolving Toward the Perfect Schedule: Co-scheduling Job Assignments and Data Replication in Wide-Area Systems Using a Genetic Algorithm , 2005, JSSPP.

[5]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[6]  J. K. Lenstra,et al.  Local Search in Combinatorial Optimisation. , 1997 .

[7]  Shubhashis Sengupta,et al.  Integration of Scheduling and Replication in Data Grids , 2004, HiPC.

[8]  Miron Livny,et al.  Stork: making data placement a first class citizen in the grid , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[9]  Alain Hertz,et al.  The tabu search metaheuristic: How we used it , 1990, Annals of Mathematics and Artificial Intelligence.

[10]  Dick H. J. Epema,et al.  An evaluation of the close-to-files processor and data co-allocation policy in multiclusters , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[11]  Alain Hertz,et al.  Using tabu search techniques for graph coloring , 1987, Computing.

[12]  Kavitha Ranganathan,et al.  Computation scheduling and data replication algorithms for data Grids , 2004 .

[13]  Francine Berman,et al.  Heuristics for scheduling parameter sweep applications in grid environments , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[14]  Remzi H. Arpaci-Dusseau,et al.  Gathering at the Well: Creating Communities for Grid I/O , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[15]  Albert Y. Zomaya,et al.  Artificial life techniques for reporting cell planning in mobile computing , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[16]  R. V. van Nieuwpoort,et al.  The Grid 2: Blueprint for a New Computing Infrastructure , 2003 .

[17]  Kavitha Ranganathan,et al.  Decoupling computation and data scheduling in distributed data-intensive applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[18]  Erwin Laure,et al.  Next-Generation EU DataGrid Data Management Services , 2003 .

[19]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[20]  Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 14-17 May 2007, Rio de Janeiro, Brazil , 2007, CCGRID.

[21]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.