B+-tree construction on massive data with Hadoop

The data processing in the Socialist Republic of Vietnam (Vietnam, hereunder) is in an early stage and a variety of problems are needed to be solved. In the Vietnamese banking and financial sectors, where managing and storing of customer data and transaction histories are being emphasized as never before, the volume of data to be secured on a daily basis are explosively increasing due to rapid economic development so that the relevant authorities are seeking an efficient and reliable way to manage them. Being a widely known popular variation of B-tree, B+-tree is considered as a most adequate tree-type data structure for bulk data. Nevertheless, as it is quite time-consuming to construct a B+-tree for massive data the authors propose a Hadoop framework-based parallel B+-tree system to deal with the problem. The system is largely divided into three phases: First, data are partitioned and distributed evenly such that each partition will have almost the same amount of data volume. Second, a parallel local B+-tree system is constructed. Finally, some small-scale B+-trees are constructed and integrated into the complete form of B+-tree which will be dealing with an entire data set. The authors expect that the proposed system will offer an efficient index structuring while reducing data processing time.

[1]  Hiroshi Matsuo,et al.  Experiment of indoor position presumption based on RSSI of Bluetooth LE beacon , 2014, 2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE).

[2]  Jun-Ho Huh,et al.  Improving the B+-Tree Construction for Transaction Log Data in Bank System Using Hadoop , 2017, ICISA.

[3]  Gary F. Birkenmeier,et al.  Ring Hulls of Semiprime Homomorphic Images , 2008 .

[4]  Wei Zhou,et al.  SNB-index: a SkipNet and B+ tree based auxiliary Cloud index , 2014, Cluster Computing.

[5]  Jun-Ho Huh,et al.  Design and Configuration of Avoidance Technique for Worst Situation in Zigbee Communications Using OPNET , 2016 .

[6]  Rohiza Ahmad,et al.  Experimental Performance Analysis of B+-Trees with Big Data Indexing Potentials , 2017 .

[7]  Hyuncheol Kim,et al.  Belief propagation decoding assisted on-the-fly Gaussian elimination for short LT codes , 2015, Cluster Computing.

[8]  Gary F. Birkenmeier,et al.  Principally Quasi-Baer Ring Hulls , 2010 .

[9]  In-Hak Joo,et al.  Improving the Quality of an R-Tree Using the Map-Reduce Framework , 2017, MUE/FutureTech.

[10]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[11]  Young-Sik Jeong,et al.  Beacon-based active media control interface in indoor ubiquitous computing environment , 2016, Cluster Computing.

[12]  Xi He,et al.  GPU-based Parallel R-tree Construction and Querying , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[13]  Sally E. Fischbeck,et al.  The Ubiquitous B-tree: Volume II , 1987 .

[14]  Stratis Viglas,et al.  Adapting the B + -tree for Asymmetric I/O , 2012, ADBIS.

[15]  Jun-Ho Huh,et al.  Advanced metering infrastructure design and test bed experiment using intelligent agents: focusing on the PLC network base technology for Smart Grid system , 2016, The Journal of Supercomputing.