Peer-to-Peer Information Retrieval: An Overview

Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these has seen widespread real-world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralized solutions. In this article we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralized client-server solutions in terms of scalability, performance, user satisfaction, and freedom.

[1]  Peter Triantafillou,et al.  Towards High Performance Peer-to-Peer Content and Resource Sharing Systems , 2003, CIDR.

[2]  Milad Shokouhi,et al.  Federated text retrieval from uncooperative overlapped collections , 2007, SIGIR.

[3]  Gerhard Weikum,et al.  KLEE: A Framework for Distributed Top-k Query Algorithms , 2005, VLDB.

[4]  Massimo Melucci,et al.  Improving Information Retrieval Effectiveness in Peer-to-Peer Networks through Query Piggybacking , 2009, ECDL.

[5]  Gerhard Weikum,et al.  MINERVAinfinity: A Scalable Efficient Peer-to-Peer Search Engine , 2005, Middleware.

[6]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[7]  Bin Yu,et al.  An incentive mechanism for message relaying in unstructured peer-to-peer systems , 2007, AAMAS '07.

[8]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[9]  Joemon M. Jose,et al.  A Suite of Testbeds for the Realistic Evaluation of Peer-to-Peer Information Retrieval Systems , 2005, ECIR.

[10]  Djoerd Hiemstra,et al.  Query-Based Sampling using Snippets , 2010, LSDS-IR@SIGIR.

[11]  Bin Yu,et al.  An incentive mechanism for message relaying in unstructured peer-to-peer systems , 2009, Electron. Commer. Res. Appl..

[12]  Filippo Menczer,et al.  Sixearch.org 2.0 peer application for collaborative web search , 2009, HT '09.

[13]  Milad Shokouhi,et al.  Using query logs to establish vocabularies in distributed information retrieval , 2007, Inf. Process. Manag..

[14]  Gerhard Weikum,et al.  Improving collection selection with overlap awareness in P2P search engines , 2005, SIGIR '05.

[15]  Knut Magne Risvik,et al.  Search engines and Web dynamics , 2002, Comput. Networks.

[16]  David Hawking,et al.  Web Information Retrieval , 2009, Information Retrieval.

[17]  David J. DeWitt,et al.  Processing Queries in a Large Peer-to-Peer System , 2003, CAiSE.

[18]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[19]  Wolf-Tilo Balke,et al.  Progressive distributed top-k retrieval in peer-to-peer networks , 2005, 21st International Conference on Data Engineering (ICDE'05).

[20]  Tim Moors,et al.  Survey of Research towards Robust Peer-to-Peer Networks: Search Methods , 2007, RFC.

[21]  Zhichen Xu,et al.  pSearch: information retrieval in structured overlays , 2003, CCRV.

[22]  D. Zeinalipour-Yazti,et al.  Information retrieval techniques for peer-to-peer networks , 2004, Computing in Science & Engineering.

[23]  Jamie Callan,et al.  Probing a Collection to Discover Its Language Model , 1998 .

[24]  Joemon M. Jose,et al.  An Evaluation of a Cluster-Based Architecture for Peer-to-Peer Information Retrieval , 2007, DEXA.

[25]  Djoerd Hiemstra,et al.  Search Result Caching in Peer-to-Peer Information Retrieval Networks , 2011, IRFC.

[26]  Steve R. Waterhouse,et al.  Distributed Search in P2P Networks , 2002, IEEE Internet Comput..

[27]  Karl Aberer,et al.  Distributed cache table: efficient query-driven processing of multi-term queries in P2P networks , 2006, P2PIR '06.

[28]  R. Akavipat,et al.  Emerging semantic communities in peer web search , 2006, P2PIR '06.

[29]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[30]  Gerhard Weikum,et al.  IQN Routing: Integrating Quality and Novelty in P2P Querying and Ranking , 2006, EDBT.

[31]  Jon Crowcroft,et al.  A survey and comparison of peer-to-peer overlay network schemes , 2005, IEEE Communications Surveys & Tutorials.

[32]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[33]  Karl Aberer,et al.  Query-driven indexing for peer-to-peer text retrieval , 2007, WWW '07.

[34]  Filippo Menczer,et al.  Intelligent Peer Networks for Collaborative Web Search , 2008, AI Mag..

[35]  Luo Si,et al.  A semisupervised learning method to merge search engine results , 2003, TOIS.

[36]  John Davies,et al.  Information Retrieval: Searching in the 21st Century , 2009, Information Retrieval.

[37]  Torsten Suel,et al.  ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval , 2003, WebDB.

[38]  Jie Lu,et al.  User modeling for full-text federated search in peer-to-peer networks , 2006, SIGIR '06.

[39]  Yong Yang,et al.  Performance of Full Text Search in Structured and Unstructured Peer-to-Peer Systems , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[40]  Cláudio L. Amorim,et al.  Peer-to-Peer Single Hop Distributed Hash Tables , 2009, GLOBECOM 2009 - 2009 IEEE Global Telecommunications Conference.

[41]  Claudia V. Goldman,et al.  PHIRST: A distributed architecture for P2P information retrieval , 2009, Inf. Syst..

[42]  Vivek Jaglan,et al.  Web Information Retrieval , 2013 .

[43]  H BloomBurton Space/time trade-offs in hash coding with allowable errors , 1970 .

[44]  Fiona Fui-Hoon Nah,et al.  A study on tolerable waiting time: how long are Web users willing to wait? , 2004, AMCIS.

[45]  B. Cohen,et al.  Incentives Build Robustness in Bit-Torrent , 2003 .

[46]  Gerhard Weikum,et al.  A Reproducible Benchmark for P2P Retrieval , 2006, ExpDB.

[47]  Karl Aberer,et al.  An Overview of Peer-to-Peer Information Systems , 2002, WDAS.

[48]  Ian Clarke,et al.  Freenet: A Distributed Anonymous Information Storage and Retrieval System , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[49]  Peter Bailey,et al.  Engineering a multi-purpose test collection for Web retrieval experiments , 2003, Inf. Process. Manag..

[50]  Gerhard Weikum,et al.  MINERVA: Collaborative P2P Search , 2005, VLDB.

[51]  Sam Joseph,et al.  NeuroGrid: Semantically Routing Queries in Peer-to-Peer Networks , 2002, NETWORKING Workshops.

[52]  Antti Oulasvirta,et al.  When more is less: the paradox of choice in search engine use , 2009, SIGIR.

[53]  Jie Lu,et al.  Full-text federated search of text-based digital libraries in peer-to-peer networks , 2006, Information Retrieval.

[54]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[55]  Dimitrios Gunopulos,et al.  A local search mechanism for peer-to-peer networks , 2002, CIKM '02.

[56]  Keith W. Ross,et al.  Computer networking - a top-down approach featuring the internet , 2000 .

[57]  Hector Garcia-Molina,et al.  Efficient search in peer to peer networks , 2004 .

[58]  Hector Garcia-Molina,et al.  The Eigentrust algorithm for reputation management in P2P networks , 2003, WWW '03.

[59]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[60]  Gerhard Weikum,et al.  Design Alternatives for Large-Scale Web Search: Alexander was Great, Aeneas a Pioneer, and Anakin has the Force , 2007 .

[61]  Karl Aberer,et al.  Fuzzynet: Ringless routing in a ring-like structured overlay , 2011, Peer-to-Peer Netw. Appl..

[62]  Ramayya Krishnan,et al.  The Virtual Commons: Why Free-Riding Can Be Tolerated in File Sharing Networks , 2002, ICIS.

[63]  Daniel Stutzbach,et al.  Understanding churn in peer-to-peer networks , 2006, IMC '06.

[64]  Norbert Fuhr,et al.  A decision-theoretic approach to database selection in networked IR , 1999, TOIS.

[65]  Richard P. Martin,et al.  PlanetP: using gossiping to build content addressable peer-to-peer information sharing communities , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[66]  Torsten Suel,et al.  Efficient query evaluation on large textual collections in a peer-to-peer environment , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[67]  Scott Shenker,et al.  The Architecture of PIER: an Internet-Scale Query Processor , 2005, CIDR.

[68]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[69]  W. Bruce Croft,et al.  Cluster-based language models for distributed retrieval , 1999, SIGIR '99.

[70]  Sheng Zhong,et al.  Sprite: a simple, cheat-proof, credit-based system for mobile ad-hoc networks , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[71]  Anirban Basu,et al.  A Survey of Peer-to-Peer Network Simulators , 2006 .

[72]  Gurmeet Singh Manku,et al.  SETS: search enhanced by topic segmentation , 2003, SIGIR.

[73]  Ion Stoica,et al.  The Case for a Hybrid P2P Search Infrastructure , 2004, IPTPS.

[74]  Luis Gravano,et al.  GlOSS: text-source discovery over the Internet , 1999, TODS.

[75]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002 .

[76]  Karl Aberer,et al.  Query-driven indexing for scalable peer-to-peer text retrieval , 2007, InfoScale '07.

[77]  Lada A. Adamic,et al.  Search in Power-Law Networks , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[78]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[79]  Norbert Fuhr,et al.  A Decision-Theoretic Model for Decentralised Query Routing in Hierarchical Peer-to-Peer Networks , 2007, ECIR.

[80]  Harold van Heerde,et al.  Privacy-aware data management by means of data degradation : making private data less sensitive over time , 2010 .

[81]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002, SIGMETRICS '02.

[82]  Jesús Carretero,et al.  Affinity P2P: A self-organizing content-based locality-aware collaborative peer-to-peer network , 2010, Comput. Networks.

[83]  Joemon M. Jose,et al.  An architecture for information retrieval over semi-collaborating Peer-to-Peer networks , 2004, SAC '04.

[84]  Gerhard Weikum,et al.  P2P Content Search: Give the Web Back to the People , 2006, IPTPS.

[85]  G. Cox,et al.  ~ " " " ' l I ~ " " -" . : -· " J , 2006 .

[86]  Hector Garcia-Molina,et al.  Open Problems in Data-Sharing Peer-to-Peer Systems , 2003, ICDT.

[87]  Sandhya Dwarkadas,et al.  Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval , 2004, NSDI.

[88]  Jie Lu,et al.  Full-text federated search in peer-to-peer networks , 2007, SIGF.

[89]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002, ICS '02.

[90]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[91]  Taoufik En-Najjary,et al.  Exploiting KAD: possible uses and misuses , 2007, CCRV.

[92]  Luis Gravano,et al.  STARTS: Stanford Protocol Proposal for Internet Retrieval and Search , 1997 .

[93]  Karl Aberer,et al.  ALVIS peers: a scalable full-text peer-to-peer retrieval engine , 2006, P2PIR '06.

[94]  Xi Zeng,et al.  Resource Search in Peer-to-Peer Network Based on Power Law Distribution , 2010, 2010 Second International Conference on Networks Security, Wireless Communications and Trusted Computing.

[95]  Hector Garcia-Molina,et al.  Semantic Overlay Networks for P2P Systems , 2004, AP2PC.

[96]  Gerhard Weikum,et al.  Flood Little, Cache More: Effective Result-Reuse in P2P IR Systems , 2008, DASFAA.

[97]  Dimitrios Tsoumakos,et al.  Adaptive probabilistic search for peer-to-peer networks , 2003, Proceedings Third International Conference on Peer-to-Peer Computing (P2P2003).

[98]  Jie Lu,et al.  Content-Based Peer-to-Peer Network Overlay for Full-Text Federated Search , 2007, RIAO.

[99]  Gerhard Weikum,et al.  Database Selection and Result Merging in P2P Web Search , 2005, DBISP2P.

[100]  Djoerd Hiemstra,et al.  Query Load Balancing by Caching Search Results in Peer-to-Peer Information Retrieval Networks , 2011 .

[101]  Ian Wakeman,et al.  The state of peer-to-peer simulators and simulations , 2007, CCRV.

[102]  Dik Lun Lee,et al.  Server Ranking for Distributed Text Retrieval Systems on the Internet , 1997, DASFAA.