MPDBS: A multi-level parallel database system based on B-Tree

Parallel processing system has been extensively developed and used in numerous commercial servers for largescale data analysis. However, the issues of scalability, reliability and efficiency cannot be achieved simultaneously. Motivated by this observation, a Multi-level Parallel Database System based on B-tree structure (MPDBS) is designed for large-scale structured data and semi-structured data. Correspondingly, a multi-level index scheme (MLIS) is proposed in this paper. Based on MPDBS framework and MLIS scheme, the system can parallel execute analyzing task and full-text query efficiently, meanwhile reducing the network I/O and disk I/O greatly. The optimal architecture of MPDBS is also derived by mathematical approach. Experimental results show that, given the same hardware configuration and TPC-H benchmark, comparing with Hive using Hadoop Distributed File System (HDFS), the query (i.e., statistical query, keyword query and point query) latency on 200GB commercial data for the proposed MPDBS is declined by 95%.