OLTP on Hardware Islands

Modern hardware is abundantly parallel and increasingly heterogeneous. The numerous processing cores have nonuniform access latencies to the main memory and to the processor caches, which causes variability in the communication costs. Unfortunately, database systems mostly assume that all processing cores are the same and that microarchitecture differences are not significant enough to appear in critical database execution paths. As we demonstrate in this paper, however, hardware heterogeneity does appear in the critical path and conventional database architectures achieve suboptimal and even worse, unpredictable performance. We perform a detailed performance analysis of OLTP deployments in servers with multiple cores per CPU (multicore) and multiple CPUs per server (multisocket). We compare different database deployment strategies where we vary the number and size of independent database instances running on a single server, from a single shared-everything instance to fine-grained shared-nothing configurations. We quantify the impact of non-uniform hardware on various deployments by (a) examining how efficiently each deployment uses the available hardware resources and (b) measuring the impact of distributed transactions and skewed requests on different workloads. Finally, we argue in favor of shared-nothing deployments that are topology- and workload-aware and take advantage of fast on-chip communication between islands of cores on the same socket.

[1]  Philip A. Bernstein,et al.  Categories and Subject Descriptors: H.2.4 [Database Management]: Systems. , 2022 .

[2]  William J. Bolosky,et al.  Mach: A New Kernel Foundation for UNIX Development , 1986, USENIX Summer.

[3]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[4]  Jignesh M. Patel,et al.  Design and evaluation of main memory hash join algorithms for multi-core CPUs , 2011, SIGMOD '11.

[5]  Lingjia Tang,et al.  The impact of memory subsystem resource sharing on datacenter applications , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[6]  Ippokratis Pandis,et al.  Data-oriented transaction execution , 2010, Proc. VLDB Endow..

[7]  Babak Falsafi,et al.  Shore-MT: a scalable storage manager for the multicore era , 2009, EDBT '09.

[8]  Alexandra Fedorova,et al.  A case for NUMA-aware contention management on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[9]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[10]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[11]  Ippokratis Pandis,et al.  Scalable and dynamically balanced shared-everything OLTP with physiological partitioning , 2012, The VLDB Journal.

[12]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.

[13]  David J. DeWitt,et al.  Shoring up persistent applications , 1994, SIGMOD '94.

[14]  Babak Falsafi,et al.  Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.

[15]  Dawson R. Engler,et al.  Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.

[16]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[17]  Jignesh M. Patel,et al.  High-Performance Concurrency Control Mechanisms for Main-Memory Databases , 2011, Proc. VLDB Endow..

[18]  Ippokratis Pandis,et al.  PLP: Page Latch-free Shared-everything OLTP , 2011, Proc. VLDB Endow..

[19]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[20]  Daniel J. Abadi,et al.  Low overhead concurrency control for partitioned main memory databases , 2010, SIGMOD Conference.

[21]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[22]  Michael Stonebraker,et al.  OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.

[23]  Stanley B. Zdonik,et al.  On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems , 2011, Proc. VLDB Endow..

[24]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[25]  Werner Vogels,et al.  Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .

[26]  Thomas F. Wenisch,et al.  Memory coherence activity prediction in commercial workloads , 2004, WMPI '04.

[27]  Sashikanth Chandrasekaran,et al.  Cache Fusion: Extending Shared-Disk Clusters with Shared Caches , 2001, VLDB.

[28]  Gustavo Alonso,et al.  Database engines on multicores, why parallelize when you can distribute? , 2011, EuroSys '11.

[29]  Ippokratis Pandis,et al.  Improving OLTP Scalability using Speculative Lock Inheritance , 2009, Proc. VLDB Endow..

[30]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.

[31]  David A. Wood,et al.  Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[32]  Pat Helland,et al.  Life beyond Distributed Transactions: an Apostate's Opinion , 2007, CIDR.

[33]  Ippokratis Pandis,et al.  Aether: A Scalable Approach to Logging , 2010, Proc. VLDB Endow..