Fast retrieval of high-dimensional feature vectors in P2P networks using compact peer data summaries

The retrieval facilities of most Peer-to-Peer (P2P) systems are limited to queries based on a unique identifier or a small set of keywords. The techniques used for this purpose are hardly applicable for content-based image retrieval (CBIR) in a P2P network. Furthermore, we will argue that the curse of dimensionality and the high communication overhead prevent the adaptation of multidimensional search trees or fast sequential scan techniques for P2P CBIR. In the present paper we will propose two compact data representations which can be distributed in a P2P network and used as the basis for a source selection. This allows to communicate only with a small fraction of all peers during query processing without deteriorating the result quality significantly. We will also present experimental results confirming our approach.

[1]  James Lee Hafner,et al.  Efficient Color Histogram Indexing for Quadratic Form Distance Functions , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Thierry Pun,et al.  Content-based query of image databases: inspirations from text retrieval , 2000, Pattern Recognit. Lett..

[3]  Andreas Henrich,et al.  Efficient content-based P2P image retrieval using peer content descriptions , 2003, IS&T/SPIE Electronic Imaging.

[4]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  J. Ritter Why Gnutella Can't Scale. No, Really , 2001 .

[6]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[7]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[8]  Ian Clarke,et al.  Freenet: A Distributed Anonymous Information Storage and Retrieval System , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[9]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[10]  Zhichen Xu,et al.  pSearch: information retrieval in structured overlays , 2003, CCRV.

[11]  Masatoshi Yoshikawa,et al.  The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[12]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[13]  Amr Z. Kronfol FASD: A Fault-tolerant, Adaptive, Scalable, Distributed Search Engine , 2002 .

[14]  Nuno Vasconcelos,et al.  Bayesian models for visual information retrieval , 2000 .

[15]  Wolfgang Müller,et al.  Classifying Documents by Distributed P2P Clustering , 2003, GI Jahrestagung.

[16]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[17]  David R. Karger,et al.  Looking up data in P2P systems , 2003, CACM.

[18]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[19]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[20]  Ka Boon Ng,et al.  Peer Clustering and Firework Query Model , 2002 .

[21]  Chi-Hang Chan,et al.  Advanced Peer Clustering and Firework Query Model in the Peer-to-Peer Network , 2003, WWW.

[22]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[23]  Luis Gravano,et al.  The Effectiveness of GlOSS for the Text Database Discovery Problem , 1994, SIGMOD Conference.

[24]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[25]  Inderjit S. Dhillon,et al.  A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[26]  Hector Garcia-Molina,et al.  One torus to rule them all: multi-dimensional queries in P2P systems , 2004, WebDB '04.

[27]  Scott Shenker,et al.  Complex Queries in Dht-based Peer-to-peer Networks , 2002 .

[28]  Thu D. Nguyen,et al.  Text-Based Content Search and Retrieval in Ad-hoc P2P Communities , 2002, NETWORKING Workshops.