Eliminating unscalable communication in transaction processing

Multicore hardware demands software parallelism. Transaction processing workloads typically exhibit high concurrency, and, thus, provide ample opportunities for parallel execution. Unfortunately, because of the characteristics of the application, transaction processing systems must moderate and coordinate communication between independent agents; since it is notoriously difficult to implement high performing transaction processing systems that incur no communication whatsoever. As a result, transaction processing systems cannot always convert abundant, even embarrassing, request-level parallelism into execution parallelism due to communication bottlenecks. Transaction processing system designers must therefore find ways to achieve scalability while still allowing communication to occur. To this end, we identify three forms of communication in the system—unbounded, fixed, and cooperative—and argue that only the first type poses a fundamental threat to scalability. The other two types tend not impose obstacles to scalability, though they may reduce single-thread performance. We argue that proper analysis of communication patterns in any software system is a powerful tool for improving the system’s scalability. Then, we present and evaluate under a common framework techniques that attack significant sources of unbounded communication during transaction processing and sketch a solution for those that remain. The solutions we present affect fundamental services of any transaction processing engine, such as locking, logging, physical page accesses, and buffer pool frame accesses. They either reduce such communication through caching, downgrade it to a less-threatening type, or eliminate it completely through system design. We find that the later technique, revisiting the transaction processing architecture, is the most effective. The final design cuts unbounded communication by roughly an order of magnitude compared with the baseline, while exhibiting better scalability on multicore machines.

[1]  Beng Chin Ooi,et al.  Towards self-tuning data placement in parallel database systems , 2000, SIGMOD '00.

[2]  Alan Jay Smith,et al.  Sequentiality and prefetching in database systems , 1978, TODS.

[3]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[4]  Erik Hagersten,et al.  Queue locks on cache coherent multiprocessors , 1994, Proceedings of 8th International Parallel Processing Symposium.

[5]  Kihong Kim,et al.  Differential logging: a commutative and associative logging scheme for highly parallel main memory database , 2001, Proceedings 17th International Conference on Data Engineering.

[6]  Eljas Soisalon-Soininen,et al.  Partial Strictness in Two-Phase Locking , 1995, ICDT.

[7]  Philip A. Bernstein,et al.  Categories and Subject Descriptors: H.2.4 [Database Management]: Systems. , 2022 .

[8]  Anastasia Ailamaki,et al.  QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.

[9]  C. Mohan,et al.  ARIES/KVL: A Key-Value Locking Method for Concurrency Control of Multiaction Transactions Operating on B-Tree Indexes , 1990, VLDB.

[10]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[11]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.

[12]  S. Sudarshan,et al.  Automating the Detection of Snapshot Isolation Anomalies , 2007, VLDB.

[13]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[14]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[15]  Roger M. Needham,et al.  On the duality of operating system structures , 1979, OPSR.

[16]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[17]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[18]  Pat Helland,et al.  Life beyond Distributed Transactions: an Apostate's Opinion , 2007, CIDR.

[19]  Ippokratis Pandis,et al.  Aether: A Scalable Approach to Logging , 2010, Proc. VLDB Endow..

[20]  David B. Lomet Recovery for Shared Disk Systems Using Multiple Redo Logs , 2002 .

[21]  Harumi A. Kuno,et al.  Efficient Locking Techniques for Databases on Modern Hardware , 2012, ADMS@VLDB.

[22]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[23]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[24]  Michael Stonebraker,et al.  OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.

[25]  Ippokratis Pandis,et al.  OLTP on Hardware Islands , 2012, Proc. VLDB Endow..

[26]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.

[27]  Rachid Guerraoui,et al.  Dividing Transactional Memories by Zero , 2008 .

[28]  Babak Falsafi,et al.  Database Servers on Chip Multiprocessors: Limitations and Opportunities , 2007, CIDR.

[29]  Ippokratis Pandis,et al.  Scalable and dynamically balanced shared-everything OLTP with physiological partitioning , 2012, The VLDB Journal.

[30]  Goetz Graefe,et al.  Hierarchical locking in B-tree indexes , 2007, BTW.

[31]  C. Mohan,et al.  ARIES/IM: an efficient and high concurrency index management method using write-ahead logging , 1992, SIGMOD '92.

[32]  Kunle Olukotun,et al.  Maximizing CMP throughput with mediocre cores , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[33]  David J. DeWitt,et al.  Shoring up persistent applications , 1994, SIGMOD '94.

[34]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[35]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[36]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[37]  Sarita V. Adve,et al.  Performance of database workloads on shared-memory systems with out-of-order processors , 1998, ASPLOS VIII.

[38]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .

[39]  Ippokratis Pandis,et al.  CMU-CS-10-101 1 Data-Oriented Transaction Execution , 2010 .

[40]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[41]  Rachid Guerraoui,et al.  Stretching transactional memory , 2009, PLDI '09.

[42]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[43]  David E. Culler,et al.  SEDA: An Architecture for Scalable, Well-Conditioned Internet Services , 2001 .

[44]  Werner Vogels,et al.  Eventually consistent , 2008, CACM.

[45]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[46]  Eljas Soisalon-Soininen,et al.  B-tree concurrency control and recovery in page-server database systems , 2006, TODS.

[47]  Michael Stonebraker Stonebraker on NoSQL and enterprises , 2011, CACM.

[48]  Alfred Z. Spector,et al.  Distributed logging for transaction processing , 1987, SIGMOD '87.

[49]  James R. Larus,et al.  Singularity: rethinking the software stack , 2007, OPSR.

[50]  Jignesh M. Patel,et al.  High-Performance Concurrency Control Mechanisms for Main-Memory Databases , 2011, Proc. VLDB Endow..

[51]  Alfons Kemper,et al.  Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems , 2012, Proc. VLDB Endow..

[52]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[53]  Maurice Herlihy,et al.  Counting networks , 1994, JACM.

[54]  A BernsteinPhilip,et al.  Multiversion concurrency controltheory and algorithms , 1983 .

[55]  Kenneth A. Ross,et al.  Cache Conscious Indexing for Decision-Support in Main Memory , 1999, VLDB.

[56]  Daniel J. Abadi,et al.  Low overhead concurrency control for partitioned main memory databases , 2010, SIGMOD Conference.

[57]  Ippokratis Pandis,et al.  Data-oriented transaction execution , 2010, Proc. VLDB Endow..

[58]  Michael L. Scott,et al.  Scalable reader-writer synchronization for shared-memory multiprocessors , 1991, PPOPP '91.

[59]  Peter M. Spiro How the Rdb � VMS Data Sharing System Became Fast , 1992 .

[60]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[61]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[62]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[63]  David J. DeWitt,et al.  Walking Four Machines By The Shore , 2001 .

[64]  Keith L. Clark,et al.  Go! — A Multi-Paradigm Programming Language for Implementing Multi-Threaded Agents , 2004, Annals of Mathematics and Artificial Intelligence.

[65]  Andreas Reuter,et al.  Group Commit Timers and High Volume Transaction Systems , 1987, HPTS.

[66]  Kenneth A. Ross,et al.  Adaptive Aggregation on Chip Multiprocessors , 2007, VLDB.

[67]  이주창 Differential logging : A Commutative and associative logging scheme for highly parallel main memory database , 2001 .

[68]  Ippokratis Pandis,et al.  PLP: Page Latch-free Shared-everything OLTP , 2011, Proc. VLDB Endow..

[69]  Daniel J. Abadi,et al.  The case for determinism in database systems , 2010, Proc. VLDB Endow..

[70]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[71]  Mark Moir,et al.  Using elimination to implement scalable and lock-free FIFO queues , 2005, SPAA '05.

[72]  Anastasia Ailamaki,et al.  Improving hash join performance through prefetching , 2004, Proceedings. 20th International Conference on Data Engineering.

[73]  Babak Falsafi,et al.  Shore-MT: a scalable storage manager for the multicore era , 2009, EDBT '09.

[74]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[75]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[76]  Shamkant B. Navathe,et al.  Two techniques for on-line index modification in shared nothing parallel databases , 1996, SIGMOD '96.