Secondary Indexing Techniques for Key-Value Stores: Two Rings To Rule Them All

Secondary indices are traditionally used in DBMS to increase the performance of queries that do not rely on the keys of the table for data reads. Many of the newer NoSQL distributed data stores, even if they provide a table-based data model such as HBase, however, do not yet have a secondary indexing feature built in. In this paper, we explore the challenges associated with indexing modern distributed table-based data stores and investigate two secondary index approaches which we have integrated within HBase. Our detailed analysis and experimental results prove the benefits of both the approaches. Further, we demonstrate that such secondary index implementation decisions cannot be made in isolation of the data distribution and that different indexing approaches can cater to different needs.