Full-text federated search in peer-to-peer networks

Peer-to-peer (P2P) networks integrate autonomous computing resources without requiring a central coordinating authority, which makes them a potentially robust and scalable model for providing federated search capability to large-scale networks of text digital libraries. However, P2P networks have so far mostly used simple search techniques based on document names or controlled-vocabulary terms, and provided very limited support for full-text search of document contents.

[1]  Yilei Shao,et al.  BuddyNet: History-Based P2P Search , 2005, ECIR.

[2]  S. S. Manna,et al.  LETTER TO THE EDITOR: Scale-free network on Euclidean space optimized by rewiring of links , 2003, cond-mat/0302224.

[3]  Luis Gravano,et al.  Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies , 1995, VLDB.

[4]  Najafi Azadeh,et al.  REAL LIFE, REAL USERS AND REAL NEEDS: A STUDY AND ANALYSIS OF USER QUERIES ON THE WEB , 2008 .

[5]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[6]  Herwig Unger,et al.  Search Methods in P2P Networks: A Survey , 2004, IICS.

[7]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[8]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[9]  Herwig Unger,et al.  Topology Evolution in P2P Distributed Networks , 2003, Applied Informatics.

[10]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[11]  Wolfgang Nejdl,et al.  A scalable and ontology-based P2P infrastructure for Semantic Web Services , 2002, Proceedings. Second International Conference on Peer-to-Peer Computing,.

[12]  Jie Lu,et al.  Merging retrieval results in hierarchical peer-to-peer networks , 2004, SIGIR '04.

[13]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[14]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002 .

[15]  Luis Gravano,et al.  When one sample is not enough: improving text database selection using shrinkage , 2004, SIGMOD '04.

[16]  James C. French,et al.  Evaluating database selection techniques: a testbed and experiment , 1998, SIGIR '98.

[17]  Luo Si,et al.  Unified utility maximization framework for resource selection , 2004, CIKM '04.

[18]  Dmitri Loguinov,et al.  On zone-balancing of peer-to-peer networks: analysis of random node join , 2004, SIGMETRICS '04/Performance '04.

[19]  Jie Lu,et al.  Pruning long documents for distributed information retrieval , 2002, CIKM '02.

[20]  Dimitrios Gunopulos,et al.  A local search mechanism for peer-to-peer networks , 2002, CIKM '02.

[21]  Ling Liu,et al.  Distributed query sampling: a quality-conscious approach , 2006, SIGIR '06.

[22]  Jie Lu,et al.  Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks , 2005, Workshop on Peer-to-Peer Information Retrieval.

[23]  Gerhard Weikum,et al.  P2P Content Search: Give the Web Back to the People , 2006, IPTPS.

[24]  Ravikumar Kondadadi,et al.  A similarity-based soft clustering algorithm for documents , 2001, Proceedings Seventh International Conference on Database Systems for Advanced Applications. DASFAA 2001.

[25]  James C. French,et al.  Comparing the performance of database selection algorithms , 1999, SIGIR '99.

[26]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[27]  King-Lup Liu,et al.  Discovering the representative of a search engine , 2001, CIKM '01.

[28]  Vana Kalogeraki,et al.  Finding good peers in peer-to-peer networks , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[29]  Jie Lu,et al.  Content-based retrieval in hybrid peer-to-peer networks , 2003, CIKM '03.

[30]  Ben Y. Zhao,et al.  Tapestry: a resilient global-scale overlay for service deployment , 2004, IEEE Journal on Selected Areas in Communications.

[31]  Peter Jansen,et al.  Threshold Calibration in CLARIT Adaptive Filtering , 1998, TREC.

[32]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[33]  Gurmeet Singh Manku,et al.  Symphony: Distributed Hashing in a Small World , 2003, USENIX Symposium on Internet Technologies and Systems.

[34]  Jie Lu,et al.  Full-text federated search of text-based digital libraries in peer-to-peer networks , 2006, Information Retrieval.

[35]  Natalie S. Glance,et al.  Community search assistant , 2001, IUI '01.

[36]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[37]  Luis Gravano,et al.  STARTS: Stanford proposal for Internet meta-searching , 1997, SIGMOD '97.

[38]  Peter Bailey,et al.  Server selection on the World Wide Web , 2000, DL '00.

[39]  David Hawking,et al.  Methods for information server selection , 1999, TOIS.

[40]  Jun Gao,et al.  A distributed and scalable peer-to-peer content discovery system supporting complex queries , 2004 .

[41]  Scott Shenker,et al.  Routing Algorithms for DHTs: Some Open Questions , 2002, IPTPS.

[42]  Felix Naumann,et al.  Semantic Overlay Clusters within Super-Peer Networks , 2003, DBISP2P.

[43]  Daniel Stutzbach,et al.  Characterizing the two-tier gnutella topology , 2005, SIGMETRICS '05.

[44]  Milad Shokouhi,et al.  Capturing collection size for distributed non-cooperative retrieval , 2006, SIGIR.

[45]  W. Bruce Croft,et al.  Cluster-based language models for distributed retrieval , 1999, SIGIR '99.

[46]  Ramayya Krishnan,et al.  Intelligent Club Management in Peer-to-Peer Networks , 2003 .

[47]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[48]  Sandhya Dwarkadas,et al.  On scaling latent semantic indexing for large peer-to-peer systems , 2004, SIGIR '04.

[49]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[50]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[51]  Luis Gravano,et al.  The Effectiveness of GlOSS for the Text Database Discovery Problem , 1994, SIGMOD Conference.

[52]  Massimo Melucci,et al.  An evaluation of a recursive weighing scheme for information retrieval in peer-to-peer networks , 2005, P2PIR '05.

[53]  Yi Zhang,et al.  Exact Maximum Likelihood Estimation for Word Mixtures , 2002 .

[54]  Luo Si,et al.  The Effect of Database Size Distribution on Resource Selection Algorithms , 2003, Distributed Multimedia Information Retrieval.

[55]  Yi Zhang,et al.  Maximum likelihood estimation for filtering thresholds , 2001, SIGIR '01.

[56]  James P. Callan,et al.  The robustness of content-based search in hierarchical peer to peer networks , 2004, CIKM '04.

[57]  Dik Lun Lee,et al.  Server Ranking for Distributed Text Retrieval Systems on the Internet , 1997, DASFAA.

[58]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[59]  Jun Wang,et al.  Self-organizing distributed collaborative filtering , 2005, SIGIR '05.

[60]  Sandhya Dwarkadas,et al.  Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval , 2004, NSDI.

[61]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[62]  Christine L. Borgman,et al.  What are Digital Libraries? Competing Visions , 1999, Inf. Process. Manag..

[63]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[64]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[65]  Dick Stenmark Query Expansion on a Corporate Intranet: Using LSI to Increase Precision in Explorative Search , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[66]  David Mazières,et al.  Kademlia: A Peer-to-Peer Information System Based on the XOR Metric , 2002, IPTPS.

[67]  Filippo Menczer,et al.  Growing and navigating the small world Web by local content , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Luis Gravano,et al.  GlOSS: text-source discovery over the Internet , 1999, TODS.

[69]  Peter Jansen,et al.  Exploration of a heuristic approach to threshold learning in adaptive filtering (poster session) , 2000, SIGIR '00.

[70]  Xiuqi Li,et al.  Searching Techniques in Peer-to-Peer Networks , 2005, Handbook on Theoretical and Algorithmic Aspects of Sensor, Ad Hoc Wireless, and Peer-to-Peer Networks.

[71]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[72]  Jon M. Kleinberg,et al.  Small-World Phenomena and the Dynamics of Information , 2001, NIPS.

[73]  Joemon M. Jose,et al.  A Suite of Testbeds for the Realistic Evaluation of Peer-to-Peer Information Retrieval Systems , 2005, ECIR.

[74]  Hector Garcia-Molina,et al.  Improving Search in Peer-to-Peer Systems , 2001 .

[75]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[76]  Luo Si,et al.  A semisupervised learning method to merge search engine results , 2003, TOIS.

[77]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[78]  Jacques Savoy,et al.  Database merging strategy based on logistic regression , 2000, Inf. Process. Manag..

[79]  James C. French,et al.  Dissemination of collection wide information in a distributed information retrieval system , 1995, SIGIR '95.

[80]  David R. Karger,et al.  Simple Efficient Load-Balancing Algorithms for Peer-to-Peer Systems , 2006, Theory of Computing Systems.

[81]  Arnaud Dury Balancing access to highly accessed keys in peer-to-peer systems , 2004, IEEE International Conference onServices Computing, 2004. (SCC 2004). Proceedings. 2004.

[82]  James P. Callan,et al.  Query-based sampling of text databases , 2001, TOIS.

[83]  Dimitrios Tsoumakos,et al.  A Comparison of Peer-to-Peer Search Methods , 2003, WebDB.

[84]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[85]  David Hawking,et al.  Overview of the TREC-9 Web Track , 2000, TREC.

[86]  Lada A. Adamic,et al.  Search in Power-Law Networks , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[87]  Ramesh Govindan,et al.  Using the small-world model to improve Freenet performance , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[88]  Peter Bailey,et al.  Engineering a multi-purpose test collection for Web retrieval experiments , 2003, Inf. Process. Manag..

[89]  Norbert Fuhr,et al.  Evaluating different methods of estimating retrieval quality for resource selection , 2003, SIGIR.

[90]  Hector Garcia-Molina,et al.  Semantic Overlay Networks for P2P Systems , 2004, AP2PC.

[91]  Balachander Krishnamurthy,et al.  On network-aware clustering of Web clients , 2000, SIGCOMM 2000.

[92]  Charles L. A. Clarke,et al.  Overview of the TREC 2004 Terabyte Track , 2004, TREC.

[93]  Dimitrios Tsoumakos,et al.  Adaptive probabilistic search for peer-to-peer networks , 2003, Proceedings Third International Conference on Peer-to-Peer Computing (P2P2003).

[94]  Jie Lu,et al.  Content-Based Peer-to-Peer Network Overlay for Full-Text Federated Search , 2007, RIAO.

[95]  Luis Gravano,et al.  Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection , 2002, VLDB.

[96]  Herwig Unger,et al.  Innovative Internet Community Systems, 4th InternationalWorkshop, IICS 2004, Guadalajara, Mexico, June 21-23, 2004, Revised Papers , 2006, IICS.

[97]  James P. Callan,et al.  Experiments Using the Lemur Toolkit , 2001, TREC.

[98]  David Hawking,et al.  Merging Results From Isolated Search Engines , 1999, Australasian Database Conference.

[99]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[100]  J. van Leeuwen Innovative Internet Community Systems , 2003, Lecture Notes in Computer Science.

[101]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[102]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[103]  Norbert Fuhr,et al.  A Decision-Theoretic Model for Decentralised Query Routing in Hierarchical Peer-to-Peer Networks , 2007, ECIR.

[104]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002, SIGMETRICS '02.

[105]  Thu D. Nguyen,et al.  Text-Based Content Search and Retrieval in Ad-hoc P2P Communities , 2002, NETWORKING Workshops.

[106]  Ellen M. Voorhees,et al.  Learning collection fusion strategies , 1995, SIGIR '95.

[107]  Jie Lu,et al.  User modeling for full-text federated search in peer-to-peer networks , 2006, SIGIR '06.