Design and Evaluation of Dynamic Replication Strategies for a High-Performance Data Grid

Physics experiments that generate large amounts of data need to be able to share it with researchers around the world. High performance grids facilitate the distribution of such data to geographically remote places. Dynamic replication can be used as a technique to reduce bandwidth consumption and access latency in accessing these huge amounts of data. We describe a simulation framework that we have developed to model a grid scenario, which enables comparative studies of alternative dynamic replication strategies. We present preliminary results obtained with this simulator, in which we evaluate the performance of six different replication strategies for three different kinds of access patterns. The simulation results show that the best strategy has significant savings in latency and bandwidth consumption if the access patterns contain a moderate amount of geographical locality.

[1]  Azer Bestavros,et al.  Demand-based document dissemination to reduce traffic and balance load in distributed information systems , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[2]  Azer Bestavros,et al.  Server-Initated Document Dissemination for the WWW , 1996, IEEE Data Eng. Bull..

[3]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[4]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[5]  Van Jacobson,et al.  Adaptive web caching: towards a new global caching architecture , 1998, Comput. Networks.

[6]  Stanley B. Zdonik,et al.  An Efficient Scheme for Dynamic Data Replication , 1993 .

[7]  A. Lauer,et al.  Quantifying the Overall Impact of Caching and Replication in the Web , 1997 .

[8]  Paul Avery,et al.  CMS Virtual Data Requirements , 2001 .

[9]  Sushil Jajodia,et al.  An adaptive data replication algorithm , 1997, TODS.

[10]  Margo I. Seltzer,et al.  The case for geographical push-caching , 1995, Proceedings 5th Workshop on Hot Topics in Operating Systems (HotOS-V).

[11]  Heinz Stockinger,et al.  Data Replication in Distributed Database Systems , 1999 .

[12]  Ian T. Foster,et al.  Replica selection in the Globus Data Grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[13]  Marvin A. Sirbu,et al.  Distributed network storage service with quality-of-service guarantees , 2000, J. Netw. Comput. Appl..

[14]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[15]  Amit Aggarwal,et al.  RaDaR: A Scalable Architecture for a Global Web Hosting Service , 1999, Comput. Networks.

[16]  Ellen W. Zegura,et al.  Self-organizing wide-area network caches , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.

[17]  Heinz Stockinger,et al.  Grid Data Management Pilot (GDMP): A Tool for Wide Area Replication , 2001 .

[18]  Rajmohan Rajaraman,et al.  A dynamic object replication and migration protocol for an Internet hosting service , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).