Exploiting locality for scalable information retrieval in peer-to-peer networks

An important problem in unstructured peer-to-peer (P2P) networks is the efficient content-based retrieval of documents shared by other peers. However, existing searching mechanisms are not scaling well because they are either based on the idea of flooding the network with queries or because they require some form of global knowledge.We propose the Intelligent Search Mechanism (ISM) which is an efficient, scalable yet simple mechanism for improving the information retrieval problem in P2P systems. Our mechanism is efficient since it is bounded by the number of neighbors and scalable because no global knowledge is required to be maintained.ISM consists of four components: A Profiling Structure which logs queryhit messages coming from neighbors, a Query Similarity function which calculates the similarity queries to a new query, RelevanceRank which is an online neighbor ranking function and a Search Mechanism which forwards queries to selected neighbors.We deploy and compare ISM with a number of other distributed search techniques over static and dynamic environments. Our experiments are performed with real data over Peerware, our middleware simulation infrastructure which is deployed on 75 workstations. Our results indicate that ISM outperforms its competitors and that in some cases it manages to achieve 100% recall rate while using only half of the network resources required by its competitors. Further, its performance is also superior with respect to the total query response time and our algorithm exhibits a learning behavior as nodes acquire more knowledge. Finally ISM works well in dynamic network topologies and in environments with replicated data sources.

[1]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[2]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[3]  Ian Clarke,et al.  Freenet: A Distributed Anonymous Information Storage and Retrieval System , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[4]  Luis Gravano,et al.  Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies , 1995, VLDB.

[5]  Clement T. Yu,et al.  Towards a highly-scalable and effective metasearch engine , 2001, WWW '01.

[6]  Roberto J. Bayardo,et al.  Make it fresh, make it quick: searching a network of personal webservers , 2003, WWW '03.

[7]  Demetrios Zeinalipour-Yazti,et al.  Information Retrieval in Peer-to-Peer Systems , 2003 .

[8]  Sriram Raghavan,et al.  Building a distributed full-text index for the Web , 2001, WWW '01.

[9]  James C. French,et al.  The Effects of Query-Based Sampling on Automatic Database Selection Algorithms , 2000 .

[10]  James P. Callan,et al.  Effective retrieval with distributed collections , 1998, SIGIR '98.

[11]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[12]  Torsten Suel,et al.  ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval , 2003, WebDB.

[13]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002, ICS '02.

[14]  Thu D. Nguyen,et al.  Text-Based Content Search and Retrieval in Ad-hoc P2P Communities , 2002, NETWORKING Workshops.

[15]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[16]  Hector Garcia-Molina,et al.  Comparing Hybrid Peer-to-Peer Systems , 2001, VLDB.

[17]  Kathryn S. McKinley,et al.  The Effect of Collection Organization and Query Locality on Information Retrieval System Performance , 2002 .

[18]  James C. French,et al.  Comparing the performance of database selection algorithms , 1999, SIGIR '99.

[19]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[20]  James C. French,et al.  The impact of database selection on distributed searching , 2000, SIGIR '00.

[21]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[22]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[23]  Hector Garcia-Molina,et al.  Improving search in peer-to-peer networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[24]  Lada A. Adamic,et al.  Search in Power-Law Networks , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Jun Gao,et al.  Design and evaluation of a distributed scalable content discovery system , 2004, IEEE Journal on Selected Areas in Communications.

[26]  Demetris Zeinalipour-Yazti,et al.  A Quantitative Analysis of the Gnutella Network Trac , 2002 .

[27]  Dimitrios Gunopulos,et al.  A local search mechanism for peer-to-peer networks , 2002, CIKM '02.

[28]  Hector Garcia-Molina,et al.  Efficient search in peer to peer networks , 2004 .

[29]  Dimitrios Tsoumakos,et al.  Adaptive probabilistic search for peer-to-peer networks , 2003, Proceedings Third International Conference on Peer-to-Peer Computing (P2P2003).