ERIS: A NUMA-Aware In-Memory Storage Engine for Analytical Workload

The ever-growing demand for more computing power forces hardware vendors to put an increasing number of multiprocessors into a single server system, which usually exhibits a non-uniform memory access (NUMA). In-memory database systems running on NUMA platforms face several issues such as the increased latency and the decreased bandwidth when accessing remote main memory. To cope with these NUMA-related issues, NUMA-awareness has to be considered as a major design principle for the fundamental architecture of a database system. In this paper we present ERIS, a NUMA-aware inmemory storage engine that is based on a data-oriented architecture. In contrast to existing approaches that focus on transactional workloads on a disk-based DBMS, ERIS aims at tera-scale analytical workloads that are executed entirely in main memory. ERIS uses an adaptive partitioning approach that exploits the topology of the underlying NUMA platform and significantly reduces NUMA-related issues. We evaluate ERIS on widespread standard server systems as well as on a system consisting of 64 multiprocessors and 512 cores. On these platforms, we achieve a more than linear speedup for index lookups and scalable parallel scan operations that are only limited by the available local bandwidth of the multiprocessor. Moreover, we measured a performance gain of up to 200% (index lookups) respectively 660% (column scans) in the memory-bound case compared to a NUMA-agnostic storage subsystem.

[1]  Wolfgang Lehner,et al.  Experimental Evaluation of NUMA Effects on Database Management Systems , 2013, BTW.

[2]  Viktor Leis,et al.  Massively Parallel NUMA-aware Hash Joins , 2013, IMDM@VLDB.

[3]  Wolfgang E. Nagel,et al.  Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Ippokratis Pandis,et al.  PLP: Page Latch-free Shared-everything OLTP , 2011, Proc. VLDB Endow..

[5]  Luca Benini,et al.  An efficient distributed memory interface for many-core platform with 3D stacked DRAM , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[6]  Simon Fraser User-level scheduling on NUMA multicore systems under Linux , 2011 .

[7]  Viktor Leis,et al.  Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age , 2014, SIGMOD Conference.

[8]  Hsien-Hsin S. Lee,et al.  Design and Analysis of 3D-MAPS (3D Massively Parallel Processor with Stacked Memory) , 2015, IEEE Transactions on Computers.

[9]  Alfons Kemper,et al.  Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems , 2012, Proc. VLDB Endow..

[10]  Wolfgang Lehner,et al.  Efficient In-Memory Indexing with Generalized Prefix Trees , 2011, BTW.

[11]  Alexandra Fedorova,et al.  A case for NUMA-aware contention management on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  Ippokratis Pandis,et al.  NUMA-aware algorithms: the case of data shuffling , 2013, CIDR.

[13]  Frederick Reiss,et al.  Main-memory scan sharing for multi-core CPUs , 2008, Proc. VLDB Endow..

[14]  Anastasia Ailamaki,et al.  ATraPos: Adaptive transaction processing on hardware Islands , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[15]  Ippokratis Pandis,et al.  OLTP on Hardware Islands , 2012, Proc. VLDB Endow..

[16]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[17]  Ippokratis Pandis,et al.  Data-oriented transaction execution , 2010, Proc. VLDB Endow..

[18]  Sudipta Sengupta,et al.  LLAMA: A Cache/Storage Subsystem for Modern Hardware , 2013, Proc. VLDB Endow..