Exploring the Use of BitTorrent as the Basis for a Large Trace Repository

Motivated by the need to deploy a public repository of multi-gigabyte trace files, we studied the BitTorrent protocol’s ability to disseminate very large files among peers. BitTorrent is a popular peer-to-peer protocol that allows parallel downloads of large files. In this paper, we analyzed user activity on BitTorrent over a four-month period with respect to supportable file sizes, file popularity, session lengths, transfer speeds, and the likelihood of service-interrupting flash crowds. Our results show that file sizes tend to be on the order of gigabytes, far larger than other peer-to-peer applications. File popularity has a distribution similar to other peer-to-peer file sharing systems. Unlike other systems, the majority of users require multiple sessions to retrieve a file, and they are willing to remain connected to the system for a very long time. Most users we observed appear to have asymmetric Internet connections, and their generally poor upload performance is mitigated by their willingness to remain connected to the system and upload for an amount of time far longer than they spent downloading. We found that service disruption due to flash crowds is unlikely, as the vast majority of users were able to begin contributing resources back to the system within seconds of connecting. Our results indicate that BitTorrent provides an effective foundation for dissemination of files that are multi-gigabyte or larger, provided more sophisticated features are added like versioning, availability, and content management.

[1]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[2]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[3]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[4]  Ian T. Foster,et al.  Mapping the Gnutella Network: Macroscopic Properties of Large-Scale Peer-to-Peer Systems , 2002, IPTPS.

[5]  Jacky C. Chu,et al.  Availability and locality measurements of peer-to-peer file systems , 2002, SPIE ITCom.

[6]  Evangelos P. Markatos,et al.  Tracing a Large-Scale Peer to Peer System: An Hour in the Life of Gnutella , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[7]  D. Nogueira,et al.  A methodology for workload characterization of file-sharing peer-to-peer networks , 2002, 2002 IEEE International Workshop on Workload Characterization.

[8]  Krishna P. Gummadi,et al.  Measurement, modeling, and analysis of a peer-to-peer file-sharing workload , 2003, SOSP '03.

[9]  Adam Wierzbicki,et al.  Deconstructing the Kazaa network , 2003, Proceedings the Third IEEE Workshop on Internet Applications. WIAPP 2003.

[10]  Ben Y. Zhao,et al.  Pond: The OceanStore Prototype , 2003, FAST.

[11]  Stanislav Shalunov Internet2 netflow weekly reports , 2003 .

[12]  Timothy Roscoe,et al.  Palimpsest: Soft-Capacity Storage for Planetary-Scale Services , 2003, HotOS.

[13]  Mikel Izal,et al.  Dissecting BitTorrent: Five Months in a Torrent's Lifetime , 2004, PAM.

[14]  Jia Wang,et al.  Analyzing peer-to-peer traffic across large networks , 2004, IEEE/ACM Trans. Netw..