Dynamic file replica location and selection strategy in data grids

In this paper, we present the design of PU-DG optimizer toolbox (also known as PU-DG Optibox), which not only finds out the best strategy according to huge amount of simulation results but also proposes the min-max balancing workload method to upgrade the efficiency of execution in data grid environments. Data grid is one of key factors to build up large-scale dataset storage system and providing high performance computing capacity, by connecting scattered computing and storage resources located dispersedly in the grid. One major challenge in data grids is how to provide good and timely access to huge amount of data in distributed locations, given the high latency of interconnection networks. In this paper, we present the design framework of PU-DG Optibox for data grid environments. The proposed toolbox is a package containing a number of high-end techniques and methods running as middleware on top of data grid platforms, in order to optimize file downloads, by improving its efficiency and performance. The PU-DG Optibox provides users and developers possibilities for setting their own priority strategies. In addition, min-max balancing workload method is proposed to avoid that one computing node with lower network bandwidth to receive a job that has high complexity of job factor. Experimental results of techniques packaged in the proposed toolbox demonstrate its effectiveness.

[1]  Ladislav Hluchý,et al.  Towards Scalable Grid Replica Optimization Framework , 2005, The 4th International Symposium on Parallel and Distributed Computing (ISPDC'05).

[2]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[3]  Jemal H. Abawajy,et al.  Placement of File Replicas in Data Grid Environments , 2004, International Conference on Computational Science.

[4]  Ian T. Foster,et al.  Replica selection in the Globus Data Grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[5]  Kurt Stockinger,et al.  OptorSim-A Grid Simulator for Studying Dynamic Data Replication Strategies , 2003 .

[6]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[7]  E. Deelman,et al.  Data replication strategies in grid environments , 2002, Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002. Proceedings..

[8]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[9]  Floriano Zini,et al.  Evaluating scheduling and replica optimisation strategies in OptorSim , 2003, Proceedings. First Latin American Web Congress.

[10]  Pangfeng Liu,et al.  Optimal replica placement strategy for hierarchical data grid systems , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[11]  Satoshi Matsuoka,et al.  A Scalable Multi-Replication Framework for Data Grid , 2005 .

[12]  Reagan Moore,et al.  Data grids, collections, and grid bricks , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..