Real-Time LSM-Trees for HTAP Workloads

Real-time data analytics systems such as SAP HANA, MemSQL, and IBM Wildfire employ hybrid data layouts, in which data are stored in different formats throughout their lifecycle. Recent data are stored in a row-oriented format to serve OLTP workloads and support high data rates, while older data are transformed to a column-oriented format for OLAP access patterns. We observe that a Log-Structured Merge (LSM) Tree is a natural fit for a lifecycleaware storage engine due to its high write throughput and leveloriented structure, in which records propagate from one level to the next over time. To build a lifecycle-aware storage engine using an LSM-Tree, we make a crucial modification to allow different data layouts in different levels, ranging from purely row-oriented to purely column-oriented, leading to a Real-Time LSM-Tree. We give a cost model and an algorithm to design a Real-Time LSM-Tree that is suitable for a given workload, followed by an experimental evaluation of LASER a prototype implementation of our idea built on top of the RocksDB key-value store. In our evaluation, LASER is almost 5x faster than Postgres (a pure row-store) and two orders of magnitude faster than MonetDB (a pure column-store) for real-time data analytics workloads.

[1]  Ramakrishna Varadarajan,et al.  The Vertica Analytic Database: C-Store 7 Years Later , 2012, Proc. VLDB Endow..

[2]  Hamid Pirahesh,et al.  Evolving Databases for New-Gen Big Data Applications , 2017, CIDR.

[3]  Norman May,et al.  The SAP HANA Database -- An Architecture Overview , 2012, IEEE Data Eng. Bull..

[4]  Song Jiang,et al.  LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items , 2015, USENIX Annual Technical Conference.

[5]  Chen Luo,et al.  LSM-based storage techniques: a survey , 2018, The VLDB Journal.

[6]  Philipp Rösch,et al.  A Storage Advisor for Hybrid-Store Databases , 2012, Proc. VLDB Endow..

[7]  Sanjeev Kumar,et al.  Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.

[8]  Gerhard Weikum,et al.  The LHAM log-structured history data access method , 2000, The VLDB Journal.

[9]  Manos Athanassoulis,et al.  Optimal Column Layout for Hybrid Workloads , 2019, Proc. VLDB Endow..

[10]  Jae-Gil Lee,et al.  Business Analytics in (a) Blink , 2012, IEEE Data Eng. Bull..

[11]  Martin L. Kersten,et al.  MonetDB: Two Decades of Research in Column-oriented Database Architectures , 2012, IEEE Data Eng. Bull..

[12]  Surajit Chaudhuri,et al.  An Online Approach to Physical Design Tuning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[13]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[14]  Vivek R. Narasayya,et al.  Integrating vertical and horizontal partitioning into automated physical database design , 2004, SIGMOD '04.

[15]  Alexander Zeier,et al.  HYRISE - A Main Memory Hybrid Storage Engine , 2010, Proc. VLDB Endow..

[16]  Yuanyuan Tian,et al.  Hybrid Transactional/Analytical Processing: A Survey , 2017, SIGMOD Conference.

[17]  Manos Athanassoulis,et al.  Monkey: Optimal Navigable Key-Value Store , 2017, SIGMOD Conference.

[18]  Chen Luo,et al.  On Performance Stability in LSM-based Storage Systems , 2019, Proc. VLDB Endow..

[19]  Andrew Pavlo,et al.  Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads , 2016, SIGMOD Conference.

[20]  Anastasia Ailamaki,et al.  H2O: a hands-free adaptive store , 2014, SIGMOD Conference.

[21]  Michael J. Carey,et al.  An LSM-based tuple compaction framework for Apache AsterixDB , 2020, Proc. VLDB Endow..

[22]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[23]  Chen Li,et al.  AsterixDB: A Scalable, Open Source BDMS , 2014, Proc. VLDB Endow..

[24]  Stratos Idreos,et al.  Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging , 2018, SIGMOD Conference.

[25]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[26]  Stratos Idreos,et al.  The Log-Structured Merge-Bush & the Wacky Continuum , 2019, SIGMOD Conference.