Scalable and dynamically balanced shared-everything OLTP with physiological partitioning

Scaling the performance of shared-everything transaction processing systems to highly parallel multicore hardware remains a challenge for database system designers. Recent proposals alleviate locking and logging bottlenecks in the system, leaving page latching as the next potential problem. To tackle the page latching problem, we propose physiological partitioning (PLP). PLP applies logical-only partitioning, maintaining the desired properties of sharedeverything designs, and introduces a multi-rooted B+Tree index structure (MRBTree) that enables the partitioning of the accesses at the physical page level. Logical partitioning and MRBTrees together ensure that all accesses to a given index page come from a single thread and, hence, can be entirely latch free; an extended design makes heap page accesses thread private as well. Moreover, MRBTrees offer an infrastructure for easy repartitioning and allow us to have a lightweight dynamic load balancing mechanism (DLB) on top of PLP. Profiling a PLP prototype running on different multicore machines shows that it acquires 85 and 68%fewer contentious critical sections, respectively, than an optimized conventional design and one based on logical-only partitioning. PLP also improves performance up to almost 50 % over the existing systems, while DLB enhances the system with rapid and robust behavior in both detecting and handling load imbalances.

[1]  Mark Moir,et al.  Using elimination to implement scalable and lock-free FIFO queues , 2005, SPAA '05.

[2]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[3]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[4]  Shimin Chen,et al.  FlashLogging: exploiting flash devices for synchronous logging performance , 2009, SIGMOD Conference.

[5]  Babak Falsafi,et al.  Shore-MT: a scalable storage manager for the multicore era , 2009, EDBT '09.

[6]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[7]  Goetz Graefe,et al.  Sorting And Indexing With Partitioned B-Trees , 2003, CIDR.

[8]  Philip A. Bernstein,et al.  Categories and Subject Descriptors: H.2.4 [Database Management]: Systems. , 2022 .

[9]  David A. Wood,et al.  Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[10]  Pat Helland,et al.  Life beyond Distributed Transactions: an Apostate's Opinion , 2007, CIDR.

[11]  Ippokratis Pandis,et al.  Aether: A Scalable Approach to Logging , 2010, Proc. VLDB Endow..

[12]  Shamkant B. Navathe,et al.  Two techniques for on-line index modification in shared nothing parallel databases , 1996, SIGMOD '96.

[13]  Chun Zhang,et al.  Automating physical database design in a parallel database , 2002, SIGMOD '02.

[14]  Beng Chin Ooi,et al.  R-tree-based data migration and self-tuning strategies in shared-nothing spatial databases , 2001, GIS '01.

[15]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .

[16]  Alexander Thomasian,et al.  Concurrency control: methods, performance, and analysis , 1998, CSUR.

[17]  Kenneth A. Ross,et al.  Cache Conscious Indexing for Decision-Support in Main Memory , 1999, VLDB.

[18]  Daniel J. Abadi,et al.  Low overhead concurrency control for partitioned main memory databases , 2010, SIGMOD Conference.

[19]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[20]  Michael Stonebraker,et al.  OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.

[21]  Raghu Ramakrishnan,et al.  Dynamic Histograms: Capturing Evolving Data Sets , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[22]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[23]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[24]  Shyam Antony,et al.  Thread Cooperation in Multicore Architectures for Frequency Counting over Multiple Data Streams , 2009, Proc. VLDB Endow..

[25]  Ippokratis Pandis,et al.  A data-oriented transaction execution engine and supporting tools , 2011, SIGMOD '11.

[26]  Jignesh M. Patel,et al.  High-Performance Concurrency Control Mechanisms for Main-Memory Databases , 2011, Proc. VLDB Endow..

[27]  Miron Livny,et al.  Concurrency control performance modeling: alternatives and implications , 1987, TODS.

[28]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.

[29]  Beng Chin Ooi,et al.  Towards self-tuning data placement in parallel database systems , 2000, SIGMOD '00.

[30]  Gary Valentin,et al.  Fractal prefetching B+-Trees: optimizing both cache and disk performance , 2002, SIGMOD '02.

[31]  Samuel Madden,et al.  Partitioning techniques for fine-grained indexing , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[32]  Thomas F. Wenisch,et al.  Memory coherence activity prediction in commercial workloads , 2004, WMPI '04.

[33]  Sashikanth Chandrasekaran,et al.  Cache Fusion: Extending Shared-Disk Clusters with Shared Caches , 2001, VLDB.

[34]  Yossi Matias,et al.  Fast incremental maintenance of approximate histograms , 1997, TODS.

[35]  Eljas Soisalon-Soininen,et al.  B-tree concurrency control and recovery in page-server database systems , 2006, TODS.

[36]  Sam Lightstone,et al.  Control Theory: a Foundational Technique for Self Managing Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[37]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[38]  Gerhard Weikum,et al.  The LHAM log-structured history data access method , 2000, The VLDB Journal.

[39]  C. Mohan,et al.  ARIES/IM: an efficient and high concurrency index management method using write-ahead logging , 1992, SIGMOD '92.

[40]  Babak Falsafi,et al.  Database Servers on Chip Multiprocessors: Limitations and Opportunities , 2007, CIDR.

[41]  Babak Falsafi,et al.  Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.

[42]  C. Mohan,et al.  ARIES/KVL: A Key-Value Locking Method for Concurrency Control of Multiaction Transactions Operating on B-Tree Indexes , 1990, VLDB.

[43]  Ippokratis Pandis,et al.  PLP: Page Latch-free Shared-everything OLTP , 2011, Proc. VLDB Endow..

[44]  Michael A. Bender,et al.  Concurrent cache-oblivious b-trees , 2005, SPAA '05.

[45]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[46]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[47]  Ippokratis Pandis,et al.  Data-oriented transaction execution , 2010, Proc. VLDB Endow..

[48]  Thomas F. Wenisch,et al.  Spatio-temporal memory streaming , 2009, ISCA '09.

[49]  Peter M. Spiro How the Rdb � VMS Data Sharing System Became Fast , 1992 .

[50]  Pradeep Dubey,et al.  PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors , 2011, Proc. VLDB Endow..

[51]  Ippokratis Pandis,et al.  Improving OLTP Scalability using Speculative Lock Inheritance , 2009, Proc. VLDB Endow..