Fast Lookups for In-Memory Column Stores: Group-Key Indices, Lookup and Maintenance

In-memory column-oriented databases have become a major topic of interest in academia and commercial applications. The demand for analytics on up-to-the-minute data and the availability of systems with hundreds of gigabytes of main memory led to the proposal of combined systems, which provide a single database for operational processing and adhoc analytical queries on current data. Recent research has identified In-Memory Column-Stores as a possible database architecture to meet these requirements. They are claimed to be capable of delivering the analytical insights while providing sufficient transactional performance. Data therein is typically split up into a write-optimized partition, which gains speed from its small size and tree-structured indices, and a larger read-only partition. To enable fast transactional and analytical performance, an index on the large, read-only partition is advisable in many cases. In this paper we present an index structure for the read-only partition, describe its advantage over the column scan and present an algorithm for the maintenance of the index. The index drastically reduces the memory traffic during query execution, leading to faster lookups and joins, thereby providing benefits to transactional and analytical processing. We analyze the memory traffic of index lookups in comparison with full column scans and the maintenance of the index structure. We develop formulas to determine the viability of an index lookup over a column scan at query runtime. While other research claimed that an index for in-memory systems should just be rebuild after every bulk-load, we show that a substantial performance increase can be achieved by reusing the former index to create an updated index. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. This article was presented at: The Third International Workshop on Accelerating Data Management Systems using Modern Processor and Storage Architectures (ADMS’12). Copyright 2012.