mDHT: a multi-level-indexed DHT algorithm to extra-large-scale data retrieval on HDFS/Hadoop architecture

Corresponding to the storing and fast searching needs of an extra-large scale of energy monitoring and statistics data, we propose a multi-level-indexed distributed hash table (mDHT) algorithm and complete a MapReduce implementation of the algorithm on the open-standard HDFS/Hbase platform. Such an approach uses a columnar storage structure for energy consumption data storage and creates a hashed index table to provide a quick search and retrieval method for extra-large-scale data processing systems. Such a hashed indexing scheme is implemented on a 3-node Hadoop cluster, and the simulation experiments at a scale up to 48 million data records indicate that, when the data volume reaches the scale of 12 million to 48 millions, the proposed mDHT algorithm presents an outstanding performance in data writing operation, compared to that of traditional SQL Server implementation. Even compared to the single-indexed DHT (sDHT) application, the mDHT solution outperforms by reducing the data retrieval time by 24.5–48.6 %. The multi-level-indexed DHT algorithm presented in this paper contributes a key technique to developing a fast search engine to the extra-large scale of data on the cloud storage architecture.