An overview of data replication on the Internet

The proliferation of the Internet is leading to high expectation of the fast turnaround time. Clients abandoning their connections due to excessive downloading delays translates directly to profit losses. Hence, minimizing the latency perceived by end-users has become the primary performance objective compared to more traditional issues, such as server utilization. The two promising techniques for improving Internet responsiveness are caching and replication. In this paper we present an overview of recent research in replication. We begin by arguing on the important role of replication in decreasing client perceived response time and illustrate the main topics that affect its successful deployment on the Internet. We analyze and characterize existing research, providing taxonomies and classifications whenever possible. Our discussion reveals several open problems and research directions.

[1]  Micah Beck,et al.  The Internet2 Distributed Storage Infrastructure Project: An Architecture for Internet Content Channels , 1998, Comput. Networks.

[2]  Vikram Visweswaraiah,et al.  Automatic Selection of Nearby Web Servers , 1998 .

[3]  Yair Amir,et al.  Seamlessly Selecting the Best Copy from Internet-Wide Replicated Web Servers , 1998, DISC.

[4]  Peter M G Apers,et al.  Data allocation in distributed database systems , 1988, TODS.

[5]  Azer Bestavros,et al.  WWW traffic reduction and load balancing through server-based caching , 1997, IEEE Concurrency.

[6]  Paul Francis,et al.  An architecture for a global Internet host distance estimation service , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[7]  Michael F. Schwartz,et al.  Locating nearby copies of replicated Internet servers , 1995, SIGCOMM '95.

[8]  Bo Li,et al.  An algorithm for finding a k-median in a directed tree , 2000, Inf. Process. Lett..

[9]  Lawrence W. Dowdy,et al.  Comparative Models of the File Assignment Problem , 1982, CSUR.

[10]  Wesley W. Chu,et al.  Optimal File Allocation in a Multiple Computer System , 1969, IEEE Transactions on Computers.

[11]  Steffen Rothkugel,et al.  Enhancing the Web's Infrastructure: From Caching to Replication , 1997, IEEE Internet Comput..

[12]  Yuval Shavitt,et al.  Constrained mirror placement on the Internet , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[13]  P.S. Yu,et al.  Scheduling algorithms for distributed Web servers , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[14]  Craig E. Wills,et al.  The Contribution of DNS Lookup Costs to Web Object Retrieval , 2000 .

[15]  R. G. Casey,et al.  Allocation of copies of a file in an information network , 1899, AFIPS '72 (Spring).

[16]  Bo Li,et al.  On the optimal placement of web proxies in the Internet , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[17]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[18]  J. Spruce Riordon,et al.  Optimal allocation of resources in distributed information networks , 1976, TODS.

[19]  D. Frank Hsu,et al.  Proceedings International Symposium On Parallel Architectures, Algorithms And Networks , 2000, Proceedings International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN 2000.

[20]  Mark Crovella,et al.  Dynamic Server Selection In The Internet , 1995, Third IEEE Workshop on the Architecture and Implementation of High Performance Communication Subsystems.

[21]  Lionel M. Ni,et al.  Supporting global replicated services by a routing-metric-aware DNS , 2000, Proceedings Second International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems. WECWIS 2000.

[22]  Anees Shaikh,et al.  On the effectiveness of DNS-based server selection , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[23]  Philip S. Yu,et al.  DNS dispatching algorithms with state estimators for scalable Web‐server clusters , 1999, World Wide Web.

[24]  Michael Rabinovich,et al.  Issues in Web Content Replication , 1998, IEEE Data Eng. Bull..

[25]  Sushil Jajodia,et al.  An adaptive data replication algorithm , 1997, TODS.

[26]  Said Salhi,et al.  Discrete Location Theory , 1991 .

[27]  Ishfaq Ahmad,et al.  Design and Evaluation of Data Allocation Algorithms for Distributed Multimedia Database Systems , 1996, IEEE J. Sel. Areas Commun..

[28]  Paul V. Mockapetris,et al.  Domain names - implementation and specification , 1987, RFC.

[29]  Prashant J. Shenoy,et al.  Adaptive leases: a strong consistency mechanism for the World Wide Web , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[30]  Lili Qiu,et al.  On the placement of Web server replicas , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[31]  Bhaba R. Sarker,et al.  Discrete location theory , 1991 .

[32]  Mahadev Satyanarayanan,et al.  Andrew: a distributed personal computing environment , 1986, CACM.

[33]  Azer Bestavros,et al.  Load balancing a cluster of web servers: using distributed packet rewriting , 2000, Conference Proceedings of the 2000 IEEE International Performance, Computing, and Communications Conference (Cat. No.00CH37086).

[34]  David E. Culler,et al.  Using smart clients to build scalable services , 1997 .

[35]  Azer Bestavros,et al.  Distributed packet rewriting and its application to scalable server architectures , 1998, Proceedings Sixth International Conference on Network Protocols (Cat. No.98TB100256).

[36]  Ishfaq Ahmad,et al.  Static and adaptive data replication algorithms for fast information access in large distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[37]  Peter Scheuermann,et al.  Selection algorithms for replicated Web servers , 1998, PERV.

[38]  Oscar H. Ibarra,et al.  SWEB: towards a scalable World Wide Web server on multicomputers , 1996, Proceedings of International Conference on Parallel Processing.

[39]  Nabil R. Adam,et al.  Distributed file allocation with consistency constraints , 1992, [1992] Proceedings of the 12th International Conference on Distributed Computing Systems.

[40]  Rajmohan Rajaraman,et al.  A dynamic object replication and migration protocol for an Internet hosting service , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[41]  Yuval Shavitt,et al.  Constrained mirror placement on the Internet , 2002, IEEE J. Sel. Areas Commun..

[42]  Chatschik Bisdikian,et al.  Cost-Based Program Allocation for Distributed Multimedia-on-Demand Systems , 1996, IEEE Multim..

[43]  Philip S. Yu,et al.  Redirection algorithms for load sharing in distributed Web-server systems , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[44]  Udi Manber,et al.  Connecting Diverse Web Search Facilities , 1998, IEEE Data Eng. Bull..

[45]  Ellen Zegura,et al.  Using experience to guide Web server selection , 1998, Electronic Imaging.

[46]  Lixia Zhang,et al.  On the placement of Internet instrumentation , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[47]  Ellen W. Zegura,et al.  A novel server selection technique for improving the response time of a replicated service , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.

[48]  Qing Li,et al.  An iterative approach for rules and data allocation in distributed deductive database systems , 1998, CIKM '98.

[49]  Philip S. Yu,et al.  High performance Web-server systems , 1998 .

[50]  Sandy Irani,et al.  Competitive Analysis of Paging , 1996, Online Algorithms.

[51]  Sampath Rangarajan,et al.  Data distribution algorithms for load balanced fault-tolerant Web access , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.