Exploiting SSDs in operational multiversion databases

Multiversion databases store both current and historical data. Rows are typically annotated with timestamps representing the period when the row is/was valid. We develop novel techniques to reduce index maintenance in multiversion databases, so that indexes can be used effectively for analytical queries over current data without being a heavy burden on transaction throughput. To achieve this end, we re-design persistent index data structures in the storage hierarchy to employ an extra level of indirection. The indirection level is stored on solid-state disks that can support very fast random I/Os, so that traversing the extra level of indirection incurs a relatively small overhead. The extra level of indirection dramatically reduces the number of magnetic disk I/Os that are needed for index updates and localizes maintenance to indexes on updated attributes. Additionally, we batch insertions within the indirection layer in order to reduce physical disk I/Os for indexing new records. In this work, we further exploit SSDs by introducing novel DeltaBlock techniques for storing the recent changes to data on SSDs. Using our DeltaBlock, we propose an efficient method to periodically flush the recently changed data from SSDs to HDDs such that, on the one hand, we keep track of every change (or delta) for every record, and, on the other hand, we avoid redundantly storing the unchanged portion of updated records. By reducing the index maintenance overhead on transactions, we enable operational data stores to create more indexes to support queries. We have developed a prototype of our indirection proposal by extending the widely used generalized search tree open-source project, which is also employed in PostgreSQL. Our working implementation demonstrates that we can significantly reduce index maintenance and/or query processing cost by a factor of 3. For the insertion of new records, our novel batching technique can save up to 90 % of the insertion time. For updates, our prototype demonstrates that we can significantly reduce the database size by up to 80 % even with a modest space allocated for DeltaBlocks on SSDs.

[1]  Ramesh K. Sitaraman,et al.  Lazy-Adaptive Tree: An Optimized Index Structure for Flash Devices , 2009, Proc. VLDB Endow..

[2]  Kenneth A. Ross,et al.  Making Updates Disk-I/O Friendly Using SSDs , 2013, Proc. VLDB Endow..

[3]  Chuan-Heng Ang,et al.  The Interval B-Tree , 1995, Inf. Process. Lett..

[4]  James Lau,et al.  File System Design for an NFS File Server Appliance , 1994, USENIX Winter.

[5]  Surajit Chaudhuri,et al.  Automating Statistics Management for Query Optimizers , 2001, IEEE Trans. Knowl. Data Eng..

[6]  Bingsheng He,et al.  Tree Indexing on Flash Disks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[7]  Z. Meral Özsoyoglu,et al.  Indexing Valid Time Intervals , 1998, DEXA.

[8]  Ramez Elmasri,et al.  The Time Index and the Monotonic B+-tree , 1993, Temporal Databases.

[9]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[10]  Kenneth A. Ross,et al.  Enhancing recovery using an SSD buffer pool extension , 2011, DaMoN '11.

[11]  Tei-Wei Kuo,et al.  An efficient B-tree layer implementation for flash-memory storage systems , 2007, TECS.

[12]  Kenneth A. Ross,et al.  An Object Placement Advisor for DB2 Using Solid State Storage , 2009, Proc. VLDB Endow..

[13]  Bernhard Seeger,et al.  An asymptotically optimal multiversion B-tree , 1996, The VLDB Journal.

[14]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[15]  Kenneth A. Ross,et al.  SSD bufferpool extensions for database systems , 2010, Proc. VLDB Endow..

[16]  Adam Leventhal,et al.  Flash storage memory , 2008, CACM.

[17]  Beng Chin Ooi,et al.  The TP-Index: a dynamic and efficient indexing mechanism for temporal databases , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[18]  Arie Segev,et al.  Efficient Indexing Methods for Temporal Relations , 1993, IEEE Trans. Knowl. Data Eng..

[19]  Kenneth A. Ross,et al.  Reducing Database Locking Contention Through Multi-version Concurrency , 2014, Proc. VLDB Endow..

[20]  Jeffrey F. Naughton,et al.  Generalized Search Trees for Database Systems , 1995, VLDB.

[21]  Vassilis J. Tsotras,et al.  Comparison of access methods for time-evolving data , 1999, CSUR.

[22]  Tilmann Rabl,et al.  Solving Big Data Challenges for Enterprise Application Performance Management , 2012, Proc. VLDB Endow..

[23]  Sudipta Sengupta,et al.  The Bw-Tree: A B-tree for new hardware platforms , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[24]  T. Bernhardsen Geographic Information Systems: An Introduction , 1999 .

[25]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[26]  Geneviève Jomier,et al.  Indexing multiversion databases , 2007, CIKM '07.

[27]  Tilmann Rabl,et al.  CaSSanDra: An SSD boosted key-value store , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[28]  Jignesh M. Patel,et al.  High-Performance Concurrency Control Mechanisms for Main-Memory Databases , 2011, Proc. VLDB Endow..

[29]  Bishwaranjan Bhattacharjee,et al.  Efficient Bulk Deletes for Multi Dimensionally Clustered Tables in DB2 , 2007, VLDB.

[30]  Mohamed F. Mokbel,et al.  Immortal DB: transaction time support for SQL server , 2005, SIGMOD '05.

[31]  Kenneth A. Ross,et al.  Efficient Index Compression in DB2 LUW , 2009, Proc. VLDB Endow..

[32]  W. H. Inmon,et al.  Building the Operational Data Store , 1995 .

[33]  David B. Lomet,et al.  Transaction time indexing with version compression , 2008, Proc. VLDB Endow..

[34]  Anastasia Ailamaki,et al.  MaSM: efficient online updates in data warehouses , 2011, SIGMOD '11.

[35]  Sang-Won Lee,et al.  Flash-based Extended Cache for Higher Throughput and Faster Recovery , 2012, Proc. VLDB Endow..

[36]  Vana Kalogeraki,et al.  Real-Time Querying of Historical Data in Flash-Equipped Sensor Devices , 2008, 2008 Real-Time Systems Symposium.

[37]  Ian F. Akyildiz,et al.  Analysis of a deferred and incremental update strategy for secondary indexes , 1991, Inf. Syst..

[38]  David J. DeWitt,et al.  Turbocharging DBMS buffer pool using SSDs , 2011, SIGMOD '11.

[39]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[40]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[41]  Tian Luo,et al.  CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives , 2011, FAST.

[42]  Raghu Ramakrishnan,et al.  bLSM: a general purpose log structured merge tree , 2012, SIGMOD Conference.

[43]  Remzi H. Arpaci-Dusseau Operating Systems: Three Easy Pieces , 2015, login Usenix Mag..

[44]  Gang Chen,et al.  LogBase: A Scalable Log-structured Database System in the Cloud , 2012, Proc. VLDB Endow..