A Framework for Supporting Tree-Like Indexes on the Chord Overlay

With the explosive growth of data, to support efficient data management including queries and updates, the database system is expected to provide tree-like indexes, such as R-tree, M-tree, B+-tree, according to different types of data. In the distributed environment, the indexes have to be scattered across the compute nodes to improve reliability and scalability. Indexes can speed up queries, but they incur maintenance cost when updates occur. In the distributed environment, each compute node maintains a subset of an index tree, so keeping the communication cost small is more crucial, or else it occupies lots of network bandwidth and the scalability and availability of the database system are affected. Further, to achieve the reliability and scalability for queries, several replicas of the index are needed, but keeping the replicas consistent is not straightforward. In this paper, we propose a framework supporting tree-like indexes, based on Chord overlay, which is a popular P2P structure. The framework dynamically tunes the number of replicas of index to balance the query cost and the update cost. Several techniques are designed to improve the efficiency of updates without the cost of performance of the queries. We implement M-tree and R-tree in our framework, and extensive experiments on real- life and synthetic datasets verify the efficiency and scalability of our framework.

[1]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[2]  Karl Aberer,et al.  P-Grid: a self-organizing structured P2P system , 2003, SGMD.

[3]  David Novak,et al.  M-Chord: a scalable distributed similarity search structure , 2006, InfoScale '06.

[4]  Gang Chen,et al.  A Framework for supporting DBMS-like indexes in the cloud , 2011, Proc. VLDB Endow..

[5]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[6]  Alan P. Sexton,et al.  Bulk Loading the M-Tree to Enhance Query Performance , 2004, BNCOD.

[7]  Beng Chin Ooi,et al.  Efficient B-tree based indexing for cloud data processing , 2010, Proc. VLDB Endow..

[8]  Derong Shen,et al.  An Adaptive Distributed Index for Similarity Queries in Metric Spaces , 2012, WAIM.

[9]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[10]  Sameer A. Nene,et al.  A simple algorithm for nearest neighbor search in high dimensions , 1997 .

[11]  Hanan Samet,et al.  Using a distributed quadtree index in peer-to-peer networks , 2007, The VLDB Journal.

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[14]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[15]  Ashwin Machanavajjhala,et al.  P-ring: an efficient and robust P2P range index structure , 2007, SIGMOD '07.

[16]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[17]  David Novak,et al.  Metric Index: An efficient and scalable solution for precise and approximate similarity search , 2011, Inf. Syst..

[18]  Beng Chin Ooi,et al.  Indexing multi-dimensional data in a cloud system , 2010, SIGMOD Conference.

[19]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[20]  Anirban Mondal,et al.  P2PR-Tree: An R-Tree-Based Spatial Index for Peer-to-Peer Environments , 2004, EDBT Workshops.

[21]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[22]  Cyrus Shahabi,et al.  Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases , 2004, VLDB.

[23]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[24]  Michal Batko,et al.  Distributed and Scalable Similarity Searching in Metric Spaces , 2004, EDBT Workshops.