Similarity Queries on Structured Data in Structured Overlays

Structured P2P systems based on distributed hash tables are a popular choice for building large-scaled data management systems. Generally, they only support exact match queries, but data heterogeneities often demand for more complex query types, particularly similarity queries. In this work, we suggest a vertical data organization, which allows for efficient processing of similarity queries on instance as well as on schema level, and we introduce corresponding physical similarity operators. Our novel approach is shown to be suitable in conjunction with P-Grid, as an example of robust, large-scaled and self-organizing P2P systems.

[1]  Erik Buchmann,et al.  Best Effort Query Processing in DHT-based P2P Systems , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[2]  Min Cai,et al.  RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network , 2004, WWW '04.

[3]  Karl Aberer Scalable Data Access in Peer-to-Peer Systems Using Unbalanced Search Trees , 2002, WDAS.

[4]  Karl Aberer,et al.  Indexing Data-oriented Overlay Networks , 2005, VLDB.

[5]  Luis Gravano,et al.  Approximate String Joins in a Database (Almost) for Free , 2001, VLDB.

[6]  Duc A. Tran A Hierarchical Semantic Overlay Approach to P2P Similarity Search , 2005, USENIX Annual Technical Conference, General Track.

[7]  Kai-Uwe Sattler,et al.  Supporting Similarity Operations Based on Approximate String Matching on the Web , 2004, CoopIS/DOA/ODBASE.

[8]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[9]  Karl Aberer,et al.  Range queries in trie-structured overlays , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[10]  Roger Barga,et al.  Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, 3-7 April 2006, Atlanta, GA, USA , 2006, ICDE Workshops.

[11]  Mayank Bawa,et al.  LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[12]  David E. Culler,et al.  PlanetLab: an overlay testbed for broad-coverage services , 2003, CCRV.

[13]  Scott Shenker,et al.  Querying the Internet with PIER , 2003, VLDB.