HMIBase: An Hierarchical Indexing System for Storing and Querying Big Data

Relational database management systems are usually deployed on single-node machines and have strict limitations in terms of da-ta structure. This means they do not work well with big data, and No SQL has been proposed as a solution. To make data queryingmore efficient, indexes and memory cache techniques are used in No SQL databases. In this paper, we propose a hierarchical in-dexing mechanism and a prototype distributed data-storage system, called HMIBase, which has hierarchical indexes for non-prima-ry keys in tables and makes data querying more efficient. HMIBase uses HBase as the lower data storage and creates a memorycache for more efficient data transmission. HMIBase supports coprocessor-to-process update requests. It also provides a client withquery and update APIs and a server to support RPCs from the client and finish jobs. To improve the cache hit ratio, we propose amemory cache replacement strategy, called Hot Score algorithm, in HMIBase. The experimental results show that Hot Score algo-rithm is better than other cache-replacement strategies.