Combination of Replication and Scheduling in Data Grids

Data Grid environment is a geographically distributed that deal with date-intensive application in scientific and enterprise computing. Dealing with large amount of data makes the requirement for efficiency in data access more critical. The goal of replication is to shorten the data access not only for user accesses but enhancing the job execution performance. In this paper, we proposed a new approach to replication based on organizing the data in Data Grid based on its property. In this paper, we organized the data in to several data categories that it belongs to. And this information is used to help improving data replication placement strategy. We study our approach and evaluate it through simulation. The result shows that our algorithm has improved 30% over the current strategies.

[1]  Shubhashis Sengupta,et al.  Integration of Scheduling and Replication in Data Grids , 2004, HiPC.

[2]  Floriano Zini,et al.  Evaluating scheduling and replica optimisation strategies in OptorSim , 2003, Proceedings. First Latin American Web Congress.

[3]  Boleslaw K. Szymanski,et al.  Simulation of dynamic data replication strategies in Data Grids , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[4]  Floriano Zini,et al.  Evaluation of an economy-based file replication strategy for a data grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[5]  Kurt Stockinger,et al.  OptorSim-A Grid Simulator for Studying Dynamic Data Replication Strategies , 2003 .

[6]  Kavitha Ranganathan,et al.  Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids , 2003, Journal of Grid Computing.

[7]  Flavia Donno,et al.  Replica Management in the European DataGrid Project , 2004, Journal of Grid Computing.

[8]  Kavitha Ranganathan,et al.  Decoupling computation and data scheduling in distributed data-intensive applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[9]  Ming Tang,et al.  The impact of data replication on job scheduling performance in the Data Grid , 2006, Future Gener. Comput. Syst..