Replica selection strategies in data grid

Replication in Data Grids reduces access latency and bandwidth consumption. When different sites hold replicas of datasets, there is a significant benefit realized by selecting the best replica. By selecting the best replica, the access latency can be minimized. In this research, we propose two different replica selection techniques. To select the best replica from information gathered locally, a simple technique called the k-Nearest Neighbor (KNN) rule is exploited. The KNN rule selects the best replica for a file by considering previous file transfer logs indicating the history of the file and those nearby. We also propose a predictive technique to estimate the transfer time between sites. The predicted transfer time can be used as an estimate of transfer bandwidth of different sites that hold replica currently, and help in selecting the best replica among different sites. Simulation results demonstrate that the k-nearest algorithm shows a significant performance improvement over the traditional replica catalog based model. Besides, the neural network predictive technique estimates the transfer time among sites more accurately than the multi-regression model.

[1]  Harvey B Newman,et al.  Data‐Intensive Grids for High‐Energy Physics , 2003 .

[2]  Kurt Stockinger,et al.  OptorSim-A Grid Simulator for Studying Dynamic Data Replication Strategies , 2003 .

[3]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[4]  Reda Alhajj,et al.  Study of Different Replica Placement and Maintenance Strategies in Data Grid , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[5]  Peter Z. Kunszt,et al.  Giggle: A Framework for Constructing Scalable Replica Location Services , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[6]  J. Schopf,et al.  Structural Prediction Models for High-Performance Distributed Applications , 1997 .

[7]  M. J. Quinn,et al.  Analytical performance prediction on multicomputers , 1993, Supercomputing '93.

[8]  Iosif Legrand,et al.  The Ultralight project: the network as an integrated and managed resource for data-intensive science , 2005, Computing in Science & Engineering.

[9]  Jennifer M. Schopf,et al.  Using Regression Techniques to Predict Large Data Transfers , 2003, Int. J. High Perform. Comput. Appl..

[10]  Ian T. Foster,et al.  Predicting the performance of wide area data transfers , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[11]  Kavitha Ranganathan,et al.  Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[12]  Ian T. Foster,et al.  The Globus project: a status report , 1998, Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).

[13]  Reda Alhajj,et al.  Predicting the performance of gridFTP transfers , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[14]  Amarnath Mukherjee,et al.  Time series models for internet traffic , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[15]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[16]  Francine Berman,et al.  Grid Computing: Making the Global Infrastructure a Reality , 2003 .

[17]  Kavitha Ranganathan,et al.  Identifying Dynamic Replication Strategies for a High-Performance Data Grid , 2001, GRID.

[18]  Richard Wolski,et al.  Dynamically forecasting network performance using the Network Weather Service , 1998, Cluster Computing.

[19]  Yang Xia,et al.  Lambda Station: On-Demand Flow Based Routing for Data Intensive Grid Applications Over Multitopology Networks , 2006, 2006 3rd International Conference on Broadband Communications, Networks and Systems.

[20]  Ian T. Foster,et al.  Replica selection in the Globus Data Grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[21]  Reda Alhajj,et al.  Replica placement design with static optimality and dynamic maintainability , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[22]  Iosif Legrand,et al.  The MONARC toolset for simulating large network-distributed processing systems , 2000, 2000 Winter Simulation Conference Proceedings (Cat. No.00CH37165).

[23]  Bijan Jabbari,et al.  DRAGON: a framework for service provisioning in heterogeneous grid networks , 2006, IEEE Communications Magazine.

[24]  David Abramson,et al.  Nimrod/G: an architecture for a resource management and scheduling system in a global computational grid , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[25]  Min Cai,et al.  A Peer-to-Peer Replica Location Service Based on a Distributed Hash Table , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[26]  A. L. Edwards,et al.  An introduction to linear regression and correlation. , 1985 .

[27]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .