Efficient Blacklisting and Pollution-Level Estimation in P2P File-Sharing Systems

P2P file-sharing systems are susceptible to pollution attacks, whereby corrupted copies of content are aggressively introduced into the system. Recent research indicates that pollution is extensive in several file sharing systems. In this paper we propose an efficient measurement methodology for identifying the sources of pollution and estimating the levels of polluted content. The methodology can be used to efficiently blacklist polluters, evaluate the success of a pollution campaign, to reduce wasted bandwidth due to the transmission of polluted content, and to remove the noise from content measurement data. The proposed methodology is efficient in that it does not involve the downloading and analysis of binary content, which would be expensive in bandwidth and in computation/human resources. The methodology is based on harvesting metadata from the file sharing system and then processing off-line the harvested meta-data. We apply the technique to the FastTrack/Kazaa file-sharing network. Analyzing the false positives and false negatives, we conclude that the methodology is efficient and accurate.

[1]  Ian T. Foster,et al.  Mapping the Gnutella Network , 2002, IEEE Internet Comput..

[2]  Krishna P. Gummadi,et al.  Measurement, modeling, and analysis of a peer-to-peer file-sharing workload , 2003, SOSP '03.

[3]  Kay A. Robbins,et al.  An empirical evaluation of client-side server selection algorithms , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[4]  Scott Shenker,et al.  Peer-to-Peer Systems III, Third International Workshop, IPTPS 2004, La Jolla, CA, USA, February 26-27, 2004, Revised Selected Papers , 2005, IPTPS.

[5]  Anne-Marie Kermarrec,et al.  Clustering in Peer-to-Peer File Sharing Workloads , 2004, IPTPS.

[6]  Jia Wang,et al.  Towards an accurate AS-level traceroute tool , 2003, SIGCOMM '03.

[7]  Hui Zhang,et al.  Measurement-based optimization techniques for bandwidth-demanding peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[8]  Jia Wang,et al.  Analyzing peer-to-peer traffic across large networks , 2002, IMW '02.

[9]  Anja Feldmann,et al.  Methodology for Estimating Network Distances of Gnutella Neighbors , 2004, GI Jahrestagung.

[10]  Paul England,et al.  The Darknet and the Future of Content Distribution , 2003 .

[11]  Rakesh Kumar,et al.  Pollution in P2P file sharing systems , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[12]  Paul England,et al.  The Darknet and the Future of Content Protection , 2002, Digital Rights Management Workshop.

[13]  Ian T. Foster,et al.  Mapping the Gnutella Network: Macroscopic Properties of Large-Scale Peer-to-Peer Systems , 2002, IPTPS.

[14]  Christos H. Papadimitriou,et al.  Free-riding and whitewashing in peer-to-peer systems , 2004, IEEE Journal on Selected Areas in Communications.

[15]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[16]  Ian T. Foster,et al.  Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design , 2002, ArXiv.

[17]  Aleksandar Kuzmanovic,et al.  Denial-of-service resilience in peer-to-peer file sharing systems , 2005, SIGMETRICS '05.