Cache Placement Methods Based on Client Demand Clustering

One of the principal motivations for content delivery networks, such as those that have been deployed over the past thr ee years, is to improve performance from the perspective of the client . Ideally, this is achieved by placing caches close to groups of clients and the n routing client requests to the nearest cache. In the first part of this paper we present a new method for identifying regions of client demand. Our met hod uses best path information from Border Gateway Protocol (BGP) routin g tables to create a hierarchical clustering of autonomous systems (AS ’s). The method iteratively adds small clusters to larger clusters based onminimizing the Hamming distance between the neighbor sets of the clusters. This method results in a forest of AS trees where we define each tree root asn Internet backbone node. This forest representation of AS connectivi ty is an idealization of the Internet’s true structure. We test for fidelit y by comparing AS hop distances to the Internet backbone. One of the strengt hs of our AS clustering method is that it naturally lends itself to thecache placement problem. In the second part of this paper, we present two cach e placement algorithms based on a tree graph of demand. The algorithms ad dress the problems of placing single caches and multiple caches so as t o minimize inter-AS traffic and client response time. We evaluate the ef fectiveness of our cache placement algorithms using Web server logs and sho w that they can greatly improve performance over random cache placemen t. Keywords—Cache Placement, Hierarchical Clustering, Demand Analysis

[1]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[2]  Lili Qiu,et al.  On the placement of Web server replicas , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[3]  Balachander Krishnamurthy,et al.  On network-aware clustering of Web clients , 2000, SIGCOMM 2000.

[4]  Balachander Krishnamurthy,et al.  On network-aware clustering of Web clients , 2000, SIGCOMM.

[5]  Paul V. Mockapetris,et al.  Domain names: Concepts and facilities , 1983, RFC.

[6]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[7]  Yakov Rekhter,et al.  Application of the Border Gateway Protocol in the Internet , 1995, RFC.

[8]  Paul V. Mockapetris,et al.  Domain names - concepts and facilities , 1987, RFC.

[9]  Kimberly C. Claffy,et al.  Web Traffic Characterization: An Assesment of the Impact of Caching Documents from NCSA's Web Server , 1995, Comput. Networks ISDN Syst..

[10]  Duane Wessels Squid internet object cache , 1996 .

[11]  Jin Zhang,et al.  Active Cache: caching dynamic contents on the Web , 1999, Distributed Syst. Eng..

[12]  R. Prim Shortest connection networks and some generalizations , 1957 .

[13]  Ramesh Govindan,et al.  An analysis of Internet inter-domain topology and route stability , 1997, Proceedings of INFOCOM '97.

[14]  Anees Shaikh,et al.  On the effectiveness of DNS-based server selection , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[15]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[16]  Yuval Shavitt,et al.  Constrained mirror placement on the Internet , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[17]  Steven Glassman,et al.  A Caching Relay for the World Wide Web , 1994, Comput. Networks ISDN Syst..

[18]  Marc Abrams,et al.  Proxy Caching That Estimates Page Load Delays , 1997, Comput. Networks.

[19]  Jean-Jacques Pansiot,et al.  On routes and multicast trees in the Internet , 1998, CCRV.

[20]  Bo Li,et al.  On the optimal placement of web proxies in the Internet , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[21]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[22]  Ramesh Govindan,et al.  Heuristics for Internet map discovery , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[23]  Peter A. Dinda,et al.  Performance characteristics of mirror servers on the Internet , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[24]  Van Jacobson,et al.  Adaptive web caching: towards a new global caching architecture , 1998, Comput. Networks.

[25]  Dave Katz,et al.  Application of the Border Gateway Protocol in the Internet , 1990, RFC.

[26]  Martin F. Arlitt,et al.  Improving Proxy Cache Performance: Analysis of Three Replacement Policies , 1999, IEEE Internet Comput..

[27]  Jussi Kangasharju,et al.  Performance evaluation of redirection schemes in content distribution networks , 2001, Comput. Commun..

[28]  Carlos Rompante Cunha,et al.  Trace Analysis and its Applications to Performance Enhancements of Distributed Information Systems , 1997 .