Scalability comparison of Peer-to-Peer similarity search structures

Due to the increasing complexity of current digital data, similarity search has become a fundamental computational task in many applications. Unfortunately, its costs are still high and grow linearly on single server structures, which prevents them from efficient application on large data volumes. In this paper, we shortly describe four recent scalable distributed techniques for similarity search and study their performance in executing queries on three different datasets. Though all the methods employ parallelism to speed up query execution, different advantages for different objectives have been identified by experiments. The reported results would be helpful for choosing the best implementations for specific applications. They can also be used for designing new and better indexing structures in the future.

[1]  David Novak,et al.  M-Chord: a scalable distributed similarity search structure , 2006, InfoScale '06.

[2]  Pavel Zezula,et al.  A Content-Addressable Network for Similarity Search in Metric Spaces , 2005, DBISP2P.

[3]  James Aspnes,et al.  Skip graphs , 2003, SODA '03.

[4]  Pavel Zezula,et al.  Similarity Search: The Metric Space Approach (Advances in Database Systems) , 2005 .

[5]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[6]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[7]  Michael B. Jones,et al.  SkipNet: A Scalable Overlay Network with Practical Locality Properties , 2003, USENIX Symposium on Internet Technologies and Systems.

[8]  Hector Garcia-Molina,et al.  One torus to rule them all: multi-dimensional queries in P2P systems , 2004, WebDB '04.

[9]  Hans-Peter Kriegel,et al.  Efficient User-Adaptable Similarity Search in Large Multimedia Databases , 1997, VLDB.

[10]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[11]  E. Chavez,et al.  Pivot selection techniques for proximity searching in metric spaces , 2001, SCCC 2001. 21st International Conference of the Chilean Computer Science Society.

[12]  Farnoush Banaei Kashani,et al.  SWAM: a family of access methods for similarity-search in peer-to-peer data networks , 2004, CIKM '04.

[13]  Srinivasan Seshan,et al.  Mercury: supporting scalable multi-attribute range queries , 2004, SIGCOMM '04.

[14]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[15]  Mark Handley,et al.  Application-Level Multicast Using Content-Addressable Networks , 2001, Networked Group Communication.

[16]  Pavel Zezula,et al.  A Scalable Nearest Neighbor Search in P2P Systems , 2004, DBISP2P.

[17]  Pedro A. Szekely,et al.  MAAN: A Multi-Attribute Addressable Network for Grid Information Services , 2003, Proceedings. First Latin American Web Congress.

[18]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[19]  Karl Aberer,et al.  P-Grid: A Self-Organizing Access Structure for P2P Information Systems , 2001, CoopIS.

[20]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[21]  Hanan Samet,et al.  A distributed quadtree index for peer-to-peer settings , 2005, 21st International Conference on Data Engineering (ICDE'05).

[22]  Peter Widmayer,et al.  Distributing a search tree among a growing number of processors , 1994, SIGMOD '94.

[23]  Patrick Valduriez,et al.  Distributed and parallel database systems , 1996, CSUR.

[24]  Pavel Zezula,et al.  Similarity Grid for Searching in Metric Spaces , 2004, DELOS.

[25]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[26]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[27]  Jan Paredaens,et al.  Advances in Database Systems , 1994 .

[28]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[29]  Richard T. Snodgrass,et al.  Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data : SIGMOD '94, Minneapolis, Minnesota, May 24-27, 1994 , 1994, ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

[30]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[31]  Witold Litwin,et al.  LH*—a scalable, distributed data structure , 1996, TODS.

[32]  Pavel Zezula,et al.  D-Index: Distance Searching Index for Metric Data Sets , 2003, Multimedia Tools and Applications.

[33]  Ashwin Machanavajjhala,et al.  P-Ring: An Index Structure for Peer-to-Peer Systems , 2004 .

[34]  Bertil Schmidt,et al.  An adaptive grid implementation of DNA sequence alignment , 2005, Future Gener. Comput. Syst..

[35]  David Novak,et al.  On scalability of the similarity search in the world of peers , 2006, InfoScale '06.

[36]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.