Clustering-Based, Load Balanced Source Selection for CBIR in P2P Networks

In peer-to-peer (P2P) networks, computers with equal rights form a logical (overlay) network in order to provide a common service that lies beyond the capacity of every single participant. Efficient similarity search is generally recognized as a frontier in research about P2P systems. One way to address this issue is using data source selection based approaches where peers summarize the data they contribute to the network, generating typically one summary per peer. When processing queries, these summaries are used to choose the peers (data sources) that are most likely to contribute to the query result. Only those data sources are contacted. There are several contributions of this article. We extend earlier work, adding a data source selection method for high-dimensional vector data, comparing different peer ranking schemes. Furthermore, we present two methods that use progressive stepwise data exchange between peers to better each peer's summary and therefore improve the system's performance. We finally examine the effect of these data exchange methods with respect to load balancing.

[1]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[2]  Andreas Henrich,et al.  Efficient content-based P2P image retrieval using peer content descriptions , 2003, IS&T/SPIE Electronic Imaging.

[3]  Wolfgang Müller,et al.  Untersuchung des Einflusses verschiedener Bild-Features und Distanzmaße im inhaltsbasierten P2P Information Retrieval , 2007, BTW.

[4]  Richard M. Karp,et al.  Randomized rumor spreading , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[5]  Wolfgang Nejdl,et al.  Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks , 2003, WWW '03.

[6]  Inderjit S. Dhillon,et al.  A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[7]  Zhichen Xu,et al.  pSearch: information retrieval in structured overlays , 2003, CCRV.

[8]  David Mazières,et al.  Kademlia: A Peer-to-Peer Information System Based on the XOR Metric , 2002, IPTPS.

[9]  Wolfgang Müller,et al.  Sample-based creation of peer summaries for efficient similarity search in scalable peer-to-peer networks , 2007, MIR '07.

[10]  Wolfgang Müller,et al.  Classifying Documents by Distributed P2P Clustering , 2003, GI Jahrestagung.

[11]  Wolfgang Müller,et al.  Fast retrieval of high-dimensional feature vectors in P2P networks using compact peer data summaries , 2003, MIR '03.

[12]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[13]  Thu D. Nguyen,et al.  Text-Based Content Search and Retrieval in Ad-hoc P2P Communities , 2002, NETWORKING Workshops.

[14]  Wolfgang Müller,et al.  Scalable summary based retrieval in P2P networks , 2005, CIKM '05.