Answering similarity queries in peer-to-peer networks

A variety of peer-to-peer (P2P) systems for sharing digital information are currently available and most of them perform searching by exact key matching. In this paper we focus on similarity searching and describe FuzzyPeer, a generic broadcast-based P2P system which supports a wide range of fuzzy queries. As a case study we present an image retrieval application implemented on top of FuzzyPeer. Users provide sample images whose sets of features are propagated through the peers. The answer consists of the top-k most similar images within the query horizon. In our system the participation of peers is ad hoc and dynamic, their functionality is symmetric and there is no centralized index. In order to avoid flooding the network with messages, we develop a technique that takes advantage of the fuzzy nature of the queries. Specifically, some queries are ''frozen'' inside the network, and are satisfied by the streaming results of similar queries that are already running. We describe several optimization techniques for single and multiple-attribute queries, and study their tradeoffs. We evaluate the performance of our algorithms by a prototype implementation on our P2P platform and a simulated large-scale network. Our results suggest that by reusing the existing streams, the scalability of the system improves both in terms of number of nodes and query throughput.

[1]  James Ze Wang,et al.  Content-based image indexing and searching using Daubechies' wavelets , 1998, International Journal on Digital Libraries.

[2]  Hector Garcia-Molina,et al.  Designing a super-peer network , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[3]  Divyakant Agrawal,et al.  A peer-to-peer framework for caching range queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[4]  Beng Chin Ooi,et al.  Answering similarity queries in peer-to-peer networks , 2004, WWW Alt. '04.

[5]  Ian Clarke,et al.  Protecting Free Expression Online with Freenet , 2002, IEEE Internet Comput..

[6]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[7]  Alex Pentland,et al.  Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[8]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[9]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[10]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[11]  Beng Chin Ooi,et al.  BestPeer: a self-configurable peer-to-peer system , 2002, Proceedings 18th International Conference on Data Engineering.

[12]  Luis Gravano,et al.  Optimizing queries over multimedia repositories , 1996, SIGMOD 1996.

[13]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[14]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD 2000.

[15]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[16]  Ronald Fagin,et al.  Combining fuzzy information from multiple systems (extended abstract) , 1996, PODS.

[17]  Beng Chin Ooi,et al.  An adaptive peer-to-peer network for distributed caching of OLAP results , 2002, SIGMOD '02.

[18]  Hector Garcia-Molina,et al.  Comparing Hybrid Peer-to-Peer Systems , 2001, VLDB.

[19]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[20]  Christopher R. Palmer,et al.  Generating network topologies that obey power laws , 2000, Globecom '00 - IEEE. Global Telecommunications Conference. Conference Record (Cat. No.00CH37137).

[21]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[22]  Hector Garcia-Molina,et al.  Improving search in peer-to-peer networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[23]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[24]  Beng Chin Ooi,et al.  Indexing the Distance: An Efficient Method to KNN Processing , 2001, VLDB.