ADDICT: Advanced Instruction Chasing for Transactions

Recent studies highlight that traditional transaction processing systems utilize the micro-architectural features of modern processors very poorly. L1 instruction cache and long-latency data misses dominate execution time. As a result, more than half of the execution cycles are wasted on memory stalls. Previous works on reducing stall time aim at improving locality through either hardware or software techniques. However, exploiting hardware resources based on the hints given by the software-side has not been widely studied for data management systems. In this paper, we observe that, independently of their high-level functionality, transactions running in parallel on a multicore system execute actions chosen from a limited sub-set of predefined database operations. Therefore, we initially perform a memory characterization study of modern transaction processing systems using standardized benchmarks. The analysis demonstrates that same-type transactions exhibit at most 6% overlap in their data footprints whereas there is up to 98% overlap in instructions. Based on the findings, we design ADDICT, a transaction scheduling mechanism that aims at maximizing the instruction cache locality. ADDICT determines the most frequent actions of database operations, whose instruction footprint can fit in an L1 instruction cache, and assigns a core to execute each of these actions. Then, it schedules each action on its corresponding core. Our prototype implementation of ADDICT reduces L1 instruction misses by 85% and the long latency data misses by 20%. As a result, ADDICT leads up to a 50% reduction in the total execution time for the evaluated workloads.

[1]  Margaret Martonosi,et al.  Informing memory operations: memory performance feedback mechanisms and their applications , 1998, TOCS.

[2]  Sarita V. Adve,et al.  Performance of database workloads on shared-memory systems with out-of-order processors , 1998, ASPLOS VIII.

[3]  Ippokratis Pandis,et al.  Aether: A Scalable Approach to Logging , 2010, Proc. VLDB Endow..

[4]  Anastasia Ailamaki,et al.  QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.

[5]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Anastasia Ailamaki,et al.  SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[7]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[8]  Gary Valentin,et al.  Fractal prefetching B+-Trees: optimizing both cache and disk performance , 2002, SIGMOD '02.

[9]  Thomas F. Wenisch,et al.  Spatio-temporal memory streaming , 2009, ISCA '09.

[10]  Babak Falsafi,et al.  Shore-MT: a scalable storage manager for the multicore era , 2009, EDBT '09.

[11]  Thomas F. Wenisch,et al.  Temporal streams in commercial server applications , 2008, 2008 IEEE International Symposium on Workload Characterization.

[12]  Koushik Chakraborty,et al.  Computation spreading: employing hardware migration to specialize CMP cores on-the-fly , 2006, ASPLOS XII.

[13]  Anastasia Ailamaki,et al.  Improving instruction cache performance in OLTP , 2006, TODS.

[14]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[15]  Anastasia Ailamaki,et al.  ATraPos: Adaptive transaction processing on hardware Islands , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[16]  Ippokratis Pandis,et al.  From A to E: analyzing TPC's OLTP benchmarks: the obsolete, the ubiquitous, the unexplored , 2013, EDBT '13.

[17]  Ippokratis Pandis,et al.  Improving OLTP Scalability using Speculative Lock Inheritance , 2009, Proc. VLDB Endow..

[18]  R. Stets,et al.  A detailed comparison of two transaction processing workloads , 2002, 2002 IEEE International Workshop on Workload Characterization.

[19]  Anastasia Ailamaki,et al.  OLTP in wonderland: where do cache misses come from in major OLTP components? , 2013, DaMoN '13.

[20]  Babak Falsafi,et al.  Proactive instruction fetch , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21]  Gabriel H. Loh,et al.  Zesto: A cycle-level simulator for highly detailed microarchitecture exploration , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[22]  Srinivasan Parthasarathy,et al.  Cache-conscious frequent pattern mining on modern and emerging processors , 2007, The VLDB Journal.

[23]  Anastasia Ailamaki,et al.  STREX: boosting instruction cache reuse in OLTP workloads through stratified transaction execution , 2013, ISCA.