Finding rare data objects in P2P file-sharing systems

Peer-to-peer file-sharing systems have hundreds of thousands of users sharing petabytes of data, however, their search functionality is limited. In general, query results contain many references to the same data object. These references are grouped, and the size of the group - the number of references it contains - metric. Although group size is effective in finding popular data, it works poorly for rare, less popular data. Other ranking functions, such as precision and cosine similarity, are more appropriate in this case. The authors showed the significant performance benefit in finding rare data using these ranking functions through extensive simulation.

[1]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[2]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[3]  Adam Wierzbicki,et al.  Proceedings of the Sixth IEEE International Conference on Peer-to-Peer Computing , 2006 .

[4]  Johan A. Pouwelse,et al.  The Bittorrent P2P File-Sharing System: Measurements and Analysis , 2005, IPTPS.

[5]  Ophir Frieder,et al.  On search in peer-to-peer file sharing systems , 2005, SAC '05.

[6]  Scott Shenker,et al.  Enhancing P2P File-Sharing with an Internet-Scale Query Processor , 2004, VLDB.

[7]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[8]  Hugo Zaragoza,et al.  Information Retrieval: Algorithms and Heuristics , 2002, Information Retrieval.

[9]  Ian T. Foster,et al.  Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design , 2002, ArXiv.

[10]  J. A. Pouwelse,et al.  An introduction to the BitTorrent Peer-to-Peer File-Sharing System , 2004 .

[11]  J. Ritter Why Gnutella Can't Scale. No, Really , 2001 .

[12]  King-Lup Liu,et al.  Building efficient and effective metasearch engines , 2002, CSUR.

[13]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[14]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[15]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[16]  Luis Gravano,et al.  Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection , 2002, VLDB.

[17]  Tyson Condie,et al.  Simulating A File-Sharing P2P Network , 2003 .

[18]  Ophir Frieder,et al.  The Design of PIRS, a Peer-to-Peer Information Retrieval System , 2004, DBISP2P.

[19]  Scott Shenker,et al.  Making gnutella-like P2P systems scalable , 2003, SIGCOMM '03.

[20]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[21]  Jie Lu,et al.  Content-based retrieval in hybrid peer-to-peer networks , 2003, CIKM '03.