Efficient Peer-to-Peer Similarity Query Processing for High-dimensional Data

Objects, such as a digital image, a text document or a DNA sequence are usually represented in a high dimensional feature space. A fundamental issue in (peer-to-peer) P2P systems is to support an efficient similarity search for high-dimensional data in metric spaces. Prior works suffer from some fundamental limitations, such as being not adaptive to a highly dynamic network, poor search efficiency under skewed data scenarios, large maintenance overhead and etc. In this study, we propose an efficient scheme, Dragon, to support P2P similarity search in metric spaces. Dragon achieves the efficiency through the following designs: 1) Dragon is based on our previous designed P2P network, Phoenix, which has the optimal routing efficiency in dynamic scenarios. 2) We design a locality-preserving naming algorithm and a routing tree for each peer in Phoenix to support range queries. A radius-estimated method is proposed to transform a kNN query to a range query. 3) A load-balancing algorithm is given to support strong query processing under skewed data distributions. Extensive experiments verify the superiority of Dragon over existing works.

[1]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[2]  Beyond the Lower Bound : A Unified and Optimal P 2 P Construction Method , 2004 .

[3]  Anthony K. H. Tung,et al.  Similarity search: a matching based approach , 2006, VLDB.

[4]  David Novak,et al.  M-Chord: a scalable distributed similarity search structure , 2006, InfoScale '06.

[5]  Beng Chin Ooi,et al.  Supporting multi-dimensional range queries in peer-to-peer systems , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[6]  Artur Andrzejak,et al.  Scalable, efficient range queries for grid information services , 2002, Proceedings. Second International Conference on Peer-to-Peer Computing,.

[7]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[8]  Beng Chin Ooi,et al.  VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[9]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[10]  Divyakant Agrawal,et al.  Approximate Range Selection Queries in Peer-to-Peer Systems , 2003, CIDR.

[11]  Hector Garcia-Molina,et al.  One torus to rule them all: multi-dimensional queries in P2P systems , 2004, WebDB '04.

[12]  Desh Ranjan,et al.  Space-Filling Curves and Their Use in the Design of Geometric Data Structures , 1997, Theor. Comput. Sci..

[13]  Dmitri Loguinov,et al.  Graph-theoretic analysis of structured peer-to-peer systems: routing distances and fault resilience , 2003, IEEE/ACM Transactions on Networking.

[14]  Manish Parashar,et al.  Enabling flexible queries with guarantees in P2P systems , 2004, IEEE Internet Computing.

[15]  Nicholas J. A. Harvey,et al.  Family trees: an ordered dictionary with optimal congestion, locality, degree, and search time , 2004, SODA '04.

[16]  Chi Zhang,et al.  Brushwood: Distributed Trees in Peer-to-Peer Systems , 2005, IPTPS.

[17]  Divyakant Agrawal,et al.  A peer-to-peer framework for caching range queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[18]  Hans-Peter Kriegel,et al.  Efficient User-Adaptable Similarity Search in Large Multimedia Databases , 1997, VLDB.

[19]  Farnoush Banaei Kashani,et al.  SWAM: a family of access methods for similarity-search in peer-to-peer data networks , 2004, CIKM '04.

[20]  Johannes Gehrke,et al.  P-tree: a p2p index for resource discovery applications , 2004, WWW Alt. '04.

[21]  Michael T. Goodrich,et al.  The rainbow skip graph: a fault-tolerant constant-degree distributed data structure , 2006, SODA '06.

[22]  Mayank Bawa,et al.  LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[23]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[24]  Desh Ranjan,et al.  Space Filling Curves and Their Use in the Design of Geometric Data Structures , 1995, LATIN.

[25]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[26]  Sriram Ramabhadran,et al.  A case study in building layered DHT applications , 2005, SIGCOMM '05.

[27]  Pavel Zezula,et al.  A Content-Addressable Network for Similarity Search in Metric Spaces , 2005, DBISP2P.