Exploring Prefetching, Pre-Execution and Branch Outcome Streaming for In-Memory Database Lookups

Lookup operations for in-memory databases are heavily memory-bound because they often rely on pointer-chasing linked data structure traversals. They are also branch heavy with branches that are hard-to-predict due to random key lookups. In this study, we show that although cache misses are the primary bottleneck for these applications, without a method for eliminating the branch mispredictions, only a small fraction of the performance benefit is achieved through prefetching. We propose the Node Tracker (NT), a novel programmable prefetcher/pre-execution unit that is highly effective in exploiting inter key-lookup parallelism to improve single-thread performance. We extend the NT with branch outcome streaming (BOS) to reduce branch mispredictions and show that NT with BOS can achieve an extra 3x speedup. Finally, we evaluated the NT as a pre-execution unit and show that we can further improve the performance in both single- and multi-threaded execution modes.

[1]  Babak Falsafi,et al.  Meet the walkers accelerating index traversals for in-memory databases , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Resit Sendag,et al.  Array Tracking Prefetcher for Indirect Accesses , 2018, 2018 IEEE 36th International Conference on Computer Design (ICCD).

[3]  Jaejin Lee,et al.  Helper thread prefetching for loosely-coupled multiprocessor systems , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[4]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[5]  Sam Ainsworth,et al.  An Event-Triggered Programmable Prefetcher for Irregular Workloads , 2018, ASPLOS.

[6]  Xiaosong Ma,et al.  Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  Pierre Michaud Best-offset hardware prefetching , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[8]  André Seznec,et al.  TAGE-SC-L Branch Predictors Again , 2016 .

[9]  Srinivas Devadas,et al.  IMP: Indirect memory prefetcher , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Calvin Lin,et al.  Linearizing irregular memory accesses for improved correlated prefetching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Jinchun Kim,et al.  Path confidence based lookahead prefetching , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Onur Mutlu,et al.  Continuous runahead: Transparent hardware acceleration for memory intensive workloads , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Babak Falsafi,et al.  Asynchronous Memory Access Chaining , 2015, Proc. VLDB Endow..