Heuristics-Based Replication Schemas for Fast Information Retrieval over the Internet

Internet today, has transformed into a global information hub. The increase in its usage and magnitude have sparkled various research problems. Because of the diverse user population, along with the frequency of access requests, the need for data replication in order; to decrease latency and communication cost, to optimize bandwidth, to effectively utilize the storage space, and to add reliability to the system has emerged to the surface along with object caching. In this paper we address the fine-grained replication of data among a set of Internet sites and develop its cost model. We solve this problem by proposing data placement algorithms. Four of our proposed techniques are based on the A-star state space searching algorithm. The optimal A-star based technique is complemented by three sub-optimal heuristics, and two natural (greedy based) selection algorithms: Local and Global Min-Min. These algorithms are effective in various environments providing vendors and users with the choice of algorithms that guarantee fast or optimal or both types of solutions.

[1]  Ishfaq Ahmad,et al.  Static and adaptive data replication algorithms for fast information access in large distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[2]  Martin Arlitt,et al.  Workload Characterization of the 1998 World Cup Web Site , 1999 .

[3]  Udi Manber,et al.  Connecting Diverse Web Search Facilities , 1998, IEEE Data Eng. Bull..

[4]  Kenneth L. Calvert,et al.  Modeling Internet topology , 1997, IEEE Commun. Mag..

[5]  Martin Arlitt,et al.  A workload characterization study of the 1998 World Cup Web site , 2000, IEEE Netw..

[6]  Yair Amir,et al.  Replication using group communication over a partitioned network (שכפול באמצעות תקשרת קבוצות מעל רשת דינמית.) , 1995 .

[7]  Michael Rabinovich,et al.  Issues in Web Content Replication , 1998, IEEE Data Eng. Bull..

[8]  Tarek F. Abdelzaher,et al.  Web Content Adaptation to Improve Server Overload Behavior , 1999, Comput. Networks.

[9]  Ishfaq Ahmad,et al.  Optimal task assignment in heterogeneous computing systems , 1997, Proceedings Sixth Heterogeneous Computing Workshop (HCW'97).

[10]  Carey Williamson,et al.  Achieving Load Balance and Efiective Caching in Clustered Web Servers , 1999 .

[11]  Sally Floyd,et al.  Difficulties in simulating the internet , 2001, TNET.

[12]  Jussi Kangasharju,et al.  Object replication strategies in content distribution networks , 2002, Comput. Commun..

[13]  Dimitris Papadias,et al.  An overview of data replication on the Internet , 2002, Proceedings International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN'02.