Improvement of Data Grid's Performance by Combining Job Scheduling with Dynamic Replication Strategy

Dealing with a large amount of data makes the requirement for efficient data accesses more critical in data grid. Through improving performance of data replication, job execution cost is reduced. In this paper, we proposed a new strategy of replication based on organizing the data in data grid based on its property. Particularly, we organized the data into several data categories that it belongs to and this categorization helps improving data replication placement strategy. In addition, the paper also introduces dataset scheduler which helps optimize obtaining input data for job. The performance studies are conducted using simulation tool and prove the improvement of scheduling performance over current approaches.

[1]  Boleslaw K. Szymanski,et al.  Simulation of dynamic data replication strategies in Data Grids , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[2]  Kavitha Ranganathan,et al.  Identifying Dynamic Replication Strategies for a High-Performance Data Grid , 2001, GRID.

[3]  Flavia Donno,et al.  Replica Management in the European DataGrid Project , 2004, Journal of Grid Computing.

[4]  Kurt Stockinger,et al.  OptorSim-A Grid Simulator for Studying Dynamic Data Replication Strategies , 2003 .

[5]  Shubhashis Sengupta,et al.  Integration of Scheduling and Replication in Data Grids , 2004, HiPC.

[6]  Floriano Zini,et al.  Evaluating scheduling and replica optimisation strategies in OptorSim , 2003, Proceedings. First Latin American Web Congress.

[7]  Floriano Zini,et al.  Evaluation of an economy-based file replication strategy for a data grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[8]  Kavitha Ranganathan,et al.  Decoupling computation and data scheduling in distributed data-intensive applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[9]  Ming Tang,et al.  Dynamic replication algorithms for the multi-tier Data Grid , 2005, Future Gener. Comput. Syst..

[10]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[11]  Soonwook Hwang,et al.  Improving Job Scheduling Performance with Dynamic Replication Strategy in Data Grids , 2007, PaCT.

[12]  Ming Tang,et al.  The impact of data replication on job scheduling performance in the Data Grid , 2006, Future Gener. Comput. Syst..

[13]  Kavitha Ranganathan,et al.  Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids , 2003, Journal of Grid Computing.

[14]  Ian Foster,et al.  The Grid: A New Infrastructure for 21st Century Science , 2002 .