Traffic Modeling and Proportional Partial Caching for Peer-to-Peer Systems

Peer-to-peer (P2P) file sharing systems generate a major portion of the Internet traffic, and this portion is expected to increase in the future. We explore the potential of deploying proxy caches in different autonomous systems (ASes) with the goal of reducing the cost incurred by Internet service providers and alleviating the load on the Internet backbone. We conduct an eight-month measurement study to analyze the P2P traffic characteristics that are relevant to caching, such as object popularity, popularity dynamics, and object size. Our study shows that the popularity of P2P objects can be modeled by a Mandelbrot-Zipf distribution, and that several workloads exist in P2P traffic. Guided by our findings, we develop a novel caching algorithm for P2P traffic that is based on object segmentation, and proportional partial admission and eviction of objects. Our trace-based simulations show that with a relatively small cache size, a byte hit rate of up to 35% can be achieved by our algorithm, which is close to the byte hit rate achieved by an off-line optimal algorithm with complete knowledge of future requests. Our results also show that our algorithm achieves a byte hit rate that is at least 40% more, and at most triple, the byte hit rate of the common Web caching algorithms. Furthermore, our algorithm is robust in face of aborted downloads, which is a common case in P2P systems.

[1]  Gerhard Weikum,et al.  Web Caching , 2003, Web & Datenbanken.

[2]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[3]  Krishna P. Gummadi,et al.  Measuring and analyzing the characteristics of Napster and Gnutella hosts , 2003, Multimedia Systems.

[4]  Jia Wang,et al.  Analyzing peer-to-peer traffic across large networks , 2002, IMW '02.

[5]  Mary K. Vernon,et al.  Characterizing the query behavior in peer-to-peer file sharing systems , 2004, IMC '04.

[6]  Pablo Rodriguez,et al.  Should internet service providers fear peer-assisted content distribution? , 2005, IMC '05.

[7]  Johan A. Pouwelse,et al.  The Bittorrent P2P File-Sharing System: Measurements and Analysis , 2005, IPTPS.

[8]  Mohamed Hefeeda,et al.  Modeling and Caching of Peer-to-Peer Traffic , 2006, Proceedings of the 2006 IEEE International Conference on Network Protocols.

[9]  Adam Wierzbicki,et al.  Cache replacement policies revisited: the case of P2P traffic , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[10]  Daniel Stutzbach,et al.  Characterizing files in the modern Gnutella network , 2006, Electronic Imaging.

[11]  Krishna P. Gummadi,et al.  Measurement, modeling, and analysis of a peer-to-peer file-sharing workload , 2003, SOSP '03.

[12]  Jiangchuan Liu,et al.  Proxy caching for media streaming over the Internet , 2004, IEEE Communications Magazine.

[13]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[14]  Z. K. Silagadze,et al.  Citations and the Zipf-Mandelbrot Law , 1999, Complex Syst..

[15]  Ki-Dong Chung,et al.  Popularity-based partial caching for VOD systems using a proxy server , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[16]  László Böszörményi,et al.  A survey of Web cache replacement strategies , 2003, CSUR.

[17]  Azer Bestavros,et al.  Popularity-aware greedy dual-size Web proxy caching algorithms , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[18]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[19]  Azer Bestavros,et al.  Network-aware partial caching for Internet streaming media , 2003, Multimedia Systems.

[20]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.

[21]  Nathaniel Leibowitz,et al.  ARE FILE SWAPPING NETWORKS CACHEABLE? CHARACTERIZING P2P TRAFFIC , 2002 .

[22]  Carey L. Williamson,et al.  ProWGen: a synthetic workload generation tool for simulation evaluation of web proxy caches , 2002, Comput. Networks.

[23]  Rajeev Motwani,et al.  Modeling correlations in web traces and implications for designing replacement policies , 2004, Comput. Networks.

[24]  Michalis Faloutsos,et al.  Is P2P dying or just hiding? [P2P traffic measurement] , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..