Understanding the communication characteristics in HBase: What are the fundamental bottlenecks?

HBase is an open source, distributed, column-oriented Key/Value database. In this paper, we focus on analyzing the performance aspects of HBase. Existing literature on HBase provides high level descriptions of the operations and present overall performance results. We conducted comprehensive experiments and identified different factors contributing to the overall latency of Get and Put operations. Our experimental results reveal that communication time is about 67% and 45% for a 1 KB Get request over 1 Gigabit Ethernet (1 GigE) and 10 Gigabit Ethernet (10 GigE) networks, respectively, for in-memory workloads. Our results show that HBase communication stack and associated operations need to be re-designed for high-performance networks like InfiniBand and its features.