PLP: Page Latch-free Shared-everything OLTP

Scaling the performance of shared-everything transaction processing systems to highly-parallel multicore hardware remains a challenge for database system designers. Recent proposals alleviate locking and logging bottlenecks in the system, leaving page latching as the next potential problem. To tackle the page latching problem, we propose physiological partitioning (PLP). The PLP design applies logical-only partitioning, maintaining the desired properties of shared-everything designs, and introduces a multi-rooted B+Tree index structure (MRBTree) which enables the partitioning of the accesses at the physical page level. Logical partitioning and MRBTrees together ensure that all accesses to a given index page come from a single thread and, hence, can be entirely latch-free; an extended design makes heap page accesses thread-private as well. Eliminating page latching allows us to simplify key code paths in the system such as B+Tree operations leading to more efficient and maintainable code. Profiling a prototype PLP system running on different multicore machines shows that it acquires 85% and 68% fewer contentious critical sections, respectively, than an optimized conventional design and one based on logical-only partitioning. PLP also improves performance up to 40% and 18%, respectively, over the existing systems.

[1]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[2]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.

[3]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[4]  C. Mohan,et al.  ARIES/KVL: A Key-Value Locking Method for Concurrency Control of Multiaction Transactions Operating on B-Tree Indexes , 1990, VLDB.

[5]  C. Mohan,et al.  ARIES/IM: an efficient and high concurrency index management method using write-ahead logging , 1992, SIGMOD '92.

[6]  Peter M. Spiro How the Rdb � VMS Data Sharing System Became Fast , 1992 .

[7]  Beng Chin Ooi,et al.  Towards self-tuning data placement in parallel database systems , 2000, SIGMOD '00.

[8]  Gerhard Weikum,et al.  The LHAM log-structured history data access method , 2000, The VLDB Journal.

[9]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[10]  Sashikanth Chandrasekaran,et al.  Cache Fusion: Extending Shared-Disk Clusters with Shared Caches , 2001, VLDB.

[11]  Goetz Graefe,et al.  Sorting And Indexing With Partitioned B-Trees , 2003, CIDR.

[12]  David A. Wood,et al.  Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[13]  Thomas F. Wenisch,et al.  Memory coherence activity prediction in commercial workloads , 2004, WMPI '04.

[14]  Mark Moir,et al.  Using elimination to implement scalable and lock-free FIFO queues , 2005, SPAA '05.

[15]  Eljas Soisalon-Soininen,et al.  B-tree concurrency control and recovery in page-server database systems , 2006, TODS.

[16]  Babak Falsafi,et al.  Database Servers on Chip Multiprocessors: Limitations and Opportunities , 2007, CIDR.

[17]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[18]  Pat Helland,et al.  Life beyond Distributed Transactions: an Apostate's Opinion , 2007, CIDR.

[19]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[20]  Michael Stonebraker,et al.  OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.

[21]  Babak Falsafi,et al.  Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.

[22]  Thomas F. Wenisch,et al.  Spatio-temporal memory streaming , 2009, ISCA '09.

[23]  Shyam Antony,et al.  Thread Cooperation in Multicore Architectures for Frequency Counting over Multiple Data Streams , 2009, Proc. VLDB Endow..

[24]  Shimin Chen,et al.  FlashLogging: exploiting flash devices for synchronous logging performance , 2009, SIGMOD Conference.

[25]  Babak Falsafi,et al.  Shore-MT: a scalable storage manager for the multicore era , 2009, EDBT '09.

[26]  Ippokratis Pandis,et al.  Improving OLTP Scalability using Speculative Lock Inheritance , 2009, Proc. VLDB Endow..

[27]  Ippokratis Pandis,et al.  Data-oriented transaction execution , 2010, Proc. VLDB Endow..

[28]  Ippokratis Pandis,et al.  Aether: A Scalable Approach to Logging , 2010, Proc. VLDB Endow..

[29]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[30]  Daniel J. Abadi,et al.  Low overhead concurrency control for partitioned main memory databases , 2010, SIGMOD Conference.

[31]  Ippokratis Pandis,et al.  A data-oriented transaction execution engine and supporting tools , 2011, SIGMOD '11.