Nearest neighbor search in metric spaces through Content-Addressable Networks

Most of the Peer-to-Peer search techniques proposed in the recent years have focused on the single-key retrieval. However, similarity search in metric spaces represents an important paradigm for content-based retrieval in many applications. In this paper we introduce an extension of the well-known Content-Addressable Network paradigm to support storage and retrieval of more generic metric space objects. In particular we address the problem of executing the nearest neighbors queries, and propose three different algorithms of query execution. An extensive experimental study on real-life data sets explores the performance characteristics of the proposed algorithms by showing their advantages and disadvantages. 2007 Elsevier Ltd. All rights reserved.

[1]  Stefan Saroiu,et al.  Dynamically Fault-Tolerant Content Addressable Networks , 2002, IPTPS.

[2]  James Aspnes,et al.  Skip graphs , 2003, SODA '03.

[3]  Hanan Samet,et al.  A distributed quadtree index for peer-to-peer settings , 2005, 21st International Conference on Data Engineering (ICDE'05).

[4]  Michael B. Jones,et al.  Unexpected Complexity: Experiences Tuning and Extending CAN , 2002 .

[5]  Hector Garcia-Molina,et al.  One torus to rule them all: multi-dimensional queries in P2P systems , 2004, WebDB '04.

[6]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[7]  Pavel Zezula,et al.  Similarity Grid for Searching in Metric Spaces , 2004, DELOS.

[8]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[9]  David Novak,et al.  On scalability of the similarity search in the world of peers , 2006, InfoScale '06.

[10]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[11]  Mark Handley,et al.  Application-Level Multicast Using Content-Addressable Networks , 2001, Networked Group Communication.

[12]  Pedro A. Szekely,et al.  MAAN: A Multi-Attribute Addressable Network for Grid Information Services , 2003, Proceedings. First Latin American Web Congress.

[13]  Hanan Samet,et al.  An efficient nearest neighbor algorithm for P2P settings , 2005, DG.O.

[14]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[15]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[16]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[17]  Pavel Zezula,et al.  A Content-Addressable Network for Similarity Search in Metric Spaces , 2005, DBISP2P.

[18]  Michael B. Jones,et al.  SkipNet: A Scalable Overlay Network with Practical Locality Properties , 2003, USENIX Symposium on Internet Technologies and Systems.

[19]  David Novak,et al.  M-Chord: a scalable distributed similarity search structure , 2006, InfoScale '06.

[20]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[21]  Srinivasan Seshan,et al.  Mercury: supporting scalable multi-attribute range queries , 2004, SIGCOMM '04.

[22]  Hans-Peter Kriegel,et al.  Efficient User-Adaptable Similarity Search in Large Multimedia Databases , 1997, VLDB.

[23]  E. Chavez,et al.  Pivot selection techniques for proximity searching in metric spaces , 2001, SCCC 2001. 21st International Conference of the Chilean Computer Science Society.

[24]  Farnoush Banaei Kashani,et al.  SWAM: a family of access methods for similarity-search in peer-to-peer data networks , 2004, CIKM '04.

[25]  Pavel Zezula,et al.  D-Index: Distance Searching Index for Metric Data Sets , 2003, Multimedia Tools and Applications.

[26]  Jonathan Kirsch,et al.  Load balancing and locality in range-queriable data structures , 2004, PODC '04.

[27]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.