FB+-tree for Big Data Management

Decades of research and experiences on managing large databases and current world's strong interests in massive data information conveyed many indexing methods to a new extent. From extensive experiments, FB+-tree has displayed its excellent potential for big data in-memory management. FB+-tree is an idea that builds fast indexing structure using multi-level key ranges, which is explained based on exploiting the B+-tree in this article. With FB+-tree, point searches and range searches are helped by early termination of searches for non-existent data. Range searches can be processed depth-first or breath-first. One group of multiple searches can be processed with one pass on the indexing structure to minimize total cost. Implementation options and strategies are explained to show the flexibility of this technology for easy adaption and high efficiency. FB+-tree can be tuned to speed up queries directed at popular ranges of index or index ranges of particular interest to the user. Extended experiments are presented particularly for testing its adaptability and performance for big data.

[1]  Elke A. Rundensteiner,et al.  B+ retake: sustaining high volume inserts into large data pages , 2001, DOLAP '01.

[2]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[3]  Michael J. Carey,et al.  A Study of Index Structures for a Main Memory Database Management System , 1986, HPTS.

[4]  Jignesh M. Patel,et al.  Effect of node size on the performance of cache-conscious B+-trees , 2003, SIGMETRICS '03.

[5]  Robert B. Hagmann A Crash Recovery Scheme for a Memory-Resident Database System , 1986, IEEE Transactions on Computers.

[6]  Jignesh M. Patel,et al.  Memory footprint matters: efficient equi-join algorithms for main memory data processing , 2013, SoCC.

[7]  Hui Xiong,et al.  High-dimensional kNN joins with incremental updates , 2010, GeoInformatica.

[8]  Beng Chin Ooi,et al.  A Performance Study of Big Data on Small Nodes , 2015, Proc. VLDB Endow..

[9]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[10]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[11]  Beng Chin Ooi,et al.  Indexing the Distance: An Efficient Method to KNN Processing , 2001, VLDB.

[12]  Rares Vernica,et al.  Hyracks: A flexible and extensible foundation for data-intensive computing , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[13]  Wolfgang Lehner,et al.  SAP HANA: The Evolution from a Modern Main-Memory Data Platform to an Enterprise Application Platform , 2013, Proc. VLDB Endow..

[14]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[15]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[16]  Ugur Demiryurek,et al.  Geospatial stream query processing using Microsoft SQL Server StreamInsight , 2010, Proc. VLDB Endow..

[17]  Badrish Chandramouli,et al.  Trill: A High-Performance Incremental Query Processor for Diverse Analytics , 2014, Proc. VLDB Endow..

[18]  Michael J. Carey,et al.  Query processing in main memory database management systems , 1986, SIGMOD '86.

[19]  Stephen M. Rumble,et al.  Log-structured memory for DRAM-based storage , 2014, FAST.

[20]  Cui Yu,et al.  FB+-tree: Indexing based on key ranges , 2014, Proceedings of the 11th IEEE International Conference on Networking, Sensing and Control.

[21]  Viktor Leis,et al.  Exploiting hardware transactional memory in main-memory databases , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[22]  Alfons Kemper,et al.  Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems , 2012, Proc. VLDB Endow..

[23]  Jens Teubner,et al.  Low-Latency Handshake Join , 2014, Proc. VLDB Endow..

[24]  Johannes Gehrke,et al.  An Experimental Analysis of Iterated Spatial Joins in Main Memory , 2013, Proc. VLDB Endow..

[25]  James Bailey,et al.  Enhancing the B+-tree by dynamic node popularity caching , 2010, Inf. Process. Lett..

[26]  Beng Chin Ooi,et al.  Distributed Online Aggregation , 2009, Proc. VLDB Endow..

[27]  Beng Chin Ooi,et al.  Indexing the edges—a simple and yet efficient approach to high-dimensional indexing , 2000, PODS.

[28]  Rajeev Rastogi,et al.  Main-memory index structures with fixed-size partial keys , 2001, SIGMOD '01.

[29]  Kenneth A. Ross,et al.  Cache Conscious Indexing for Decision-Support in Main Memory , 1999, VLDB.

[30]  Kenneth A. Ross,et al.  Making B+-Trees Cache Conscious in Main Memory , 2000, SIGMOD Conference.

[31]  Hyeontaek Lim,et al.  MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[32]  Christian S. Jensen,et al.  Spatial Joins in Main Memory: Implementation Matters! , 2014, Proc. VLDB Endow..

[33]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[34]  Parag Agrawal,et al.  The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.

[35]  Wolfgang Lehner,et al.  Efficient transaction processing in SAP HANA database: the end of a column store myth , 2012, SIGMOD Conference.

[36]  Sam Lightstone,et al.  Memory-Efficient Hash Joins , 2014, Proc. VLDB Endow..