BR-Tree: A Scalable Prototype for Supporting Multiple Queries of Multidimensional Data

Multidimensional data indexing has received much research attention recently in a centralized system. However, it remains a nascent area of research in providing an integrated structure for multiple queries on multidimensional data in a distributed environment. In this paper, we propose a new data structure, called BR-tree (Bloom-filter-based R-tree), and implement such a prototype in the context of a distributed system. The node in a BR-tree, viewed as an expansion from the traditional R-tree node structure, incorporates space-efficient Bloom filters to facilitate fast membership queries. The proposed BR-tree can simultaneously support not only existing point and range queries, but also cover and bound queries that can potentially benefit various data indexing services. Compared with previous data structures, BR-tree achieves space efficiency and provides quick response (lesO(log n)) on these four types of queries. Our extensive experiments in a distributed environment further validate the practicality and efficiency of the proposed BR-tree structure.

[1]  Andrei Z. Broder,et al.  Using multiple hash functions to improve IP lookups , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[2]  Shipeng Li,et al.  Distributed Segment Tree: Support of Range Query and Cover Query over DHT , 2006, IPTPS.

[3]  Sarang Dharmapurikar,et al.  Longest prefix matching using bloom filters , 2006, IEEE/ACM Transactions on Networking.

[4]  Mark Crovella,et al.  Characteristics of WWW Client-based Traces , 1995 .

[5]  Srinivasan Seshan,et al.  Mercury: supporting scalable multi-attribute range queries , 2004, SIGCOMM '04.

[6]  James Aspnes,et al.  Skip graphs , 2003, SODA '03.

[7]  Fang Hao,et al.  Incremental Bloom Filters , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[8]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[9]  Hong Jiang,et al.  HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems , 2008, IEEE Transactions on Parallel and Distributed Systems.

[10]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[11]  Mark de Berg,et al.  The Priority R-tree: a practically efficient and worst-case optimal R-tree , 2004, SIGMOD '04.

[12]  Elisa Bertino,et al.  Indexing Techniques for Advanced Database Systems , 1997, The Springer International Series on Advances in Database Systems.

[13]  Jie Wu,et al.  Theory and Network Applications of Dynamic Bloom Filters , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[14]  Young-Jin Kim,et al.  Multi-dimensional range queries in sensor networks , 2003, SenSys '03.

[15]  Scott Shenker,et al.  Complex Queries in Dht-based Peer-to-peer Networks , 2002 .

[16]  Dan Feng,et al.  RBF: a new storage structure for space-efficient queries for multidimensional metadata in OSS , 2007 .

[17]  Jiannong Cao,et al.  Delay-Bounded Range Queries in DHT-based Peer-to-Peer Systems , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[18]  Erik Riedel,et al.  A Framework for Evaluating Storage System Security , 2002, FAST.

[19]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[20]  Beng Chin Ooi,et al.  VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[21]  Yannis Theodoridis,et al.  On the Generation of Spatiotemporal Datasets , 1999 .

[22]  Yu Hua,et al.  A Multi-attribute Data Structure with Parallel Bloom Filters for Network Services , 2006, HiPC.

[23]  Michael Mitzenmacher,et al.  Compressed bloom filters , 2001, PODC '01.

[24]  Cédric du Mouza,et al.  SD-Rtree: A Scalable Distributed Rtree , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[25]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[26]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[27]  Yossi Matias,et al.  Spectral bloom filters , 2003, SIGMOD '03.

[28]  Yu Hua,et al.  Using Parallel Bloom Filters for Multiattribute Representation on Network Services , 2010, IEEE Transactions on Parallel and Distributed Systems.

[29]  A. Kumar,et al.  Space-code bloom filter for efficient per-flow traffic measurement , 2004, IEEE INFOCOM 2004.

[30]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[31]  Beng Chin Ooi,et al.  BATON: A Balanced Tree Structure for Peer-to-Peer Networks , 2005, VLDB.

[32]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[33]  Hong Jiang,et al.  Scalable and Adaptive Metadata Management in Ultra Large-Scale File Systems , 2008, 2008 The 28th International Conference on Distributed Computing Systems.

[34]  Jun Gao,et al.  An adaptive protocol for efficient support of range queries in DHT-based systems , 2004, Proceedings of the 12th IEEE International Conference on Network Protocols, 2004. ICNP 2004..

[35]  Robert Devine,et al.  Design and Implementation of DDH: A Distributed Dynamic Hashing Algorithm , 1993, FODO.

[36]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[37]  George Varghese,et al.  Scalable packet classification , 2001, SIGCOMM '01.