BF-Matrix: A Secondary Index for the Cloud Storage

Although people have proposed many kinds of NoSQL databases, also referred as Key-Value stores, there is still lack of an efficient solution for the problem of non-key attribute queries. In this paper, we propose BF-Matrix, a hierarchical index composed of bloom filter and B+ tree. Faced with the massive data and the large scale cluster, the layered solution could shorten the search path and make the best of scattered resources. Moreover, it is able to scale up and scale back according to the changes of data size and cluster scale, and isolate the job of update and retrieval in a limited scope. To eliminate the risk of false negative and to ensure our index “look like consistent”, two rules are given to specify the behavior of index update and data retrieval . Experimental results demonstrate that our solution not only outperforms the state of the art, but also is flexible enough to adapt to the cloud environment.

[1]  Sasu Tarkoma,et al.  Theory and Practice of Bloom Filters for Distributed Systems , 2012, IEEE Communications Surveys & Tutorials.

[2]  Beng Chin Ooi,et al.  Indexing multi-dimensional data in a cloud system , 2010, SIGMOD Conference.

[3]  Lidan Shou,et al.  An efficient and compact indexing scheme for large-scale data store , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[4]  Qi Zhang,et al.  MBA: A market-based approach to data allocation and dynamic migration for cloud database , 2012, Science China Information Sciences.

[5]  Xiaofeng Meng,et al.  An efficient multi-dimensional index for cloud data management , 2009, CloudDB@CIKM.

[6]  Vinay Setty,et al.  Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) , 2010, Proc. VLDB Endow..

[7]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[8]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[9]  Marcos K. Aguilera,et al.  A practical scalable distributed B-tree , 2008, Proc. VLDB Endow..

[10]  Beng Chin Ooi,et al.  Efficient B-tree based indexing for cloud data processing , 2010, Proc. VLDB Endow..

[11]  Jie Wu,et al.  The Dynamic Bloom Filters , 2010, IEEE Transactions on Knowledge and Data Engineering.

[12]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[13]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[14]  Jorge-Arnulfo Quiané-Ruiz,et al.  Only Aggressive Elephants are Fast Elephants , 2012, Proc. VLDB Endow..

[15]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[16]  David Hutchison,et al.  Scalable Bloom Filters , 2007, Inf. Process. Lett..

[17]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.