iLSM-SSD: An Intelligent LSM-Tree Based Key-Value SSD for Data Analytics

Several key-value stores such as RocksDB and MongoDB are implemented on the file system using the Log-Structured Merge-Tree (LSM-tree). The LSM-tree involves high compaction overhead. To minimize this overhead, WiscKey, the state-of-the-art LSM-tree, separates key and value, appends the value to the Value Log file, and LSM-tree manages only the key and Value Log offset. This minimizes the compaction overhead by reducing the number of SSTables managed by the LSM-tree. However, WiscKey still has a high I/O stack overhead that must go through the OS file system and block-layer. Therefore, this paper proposes iLSM-SSD that implements WiscKey in SSD and supports near-data processing. iLSM-SSD has the following features: (i) iLSM-SSD implements a key-value separation based LSM-tree in a limited memory space inside the SSD. (ii) The Value Log offset update management overhead incurred during the Value Log cleaning has a significant performance impact on CPU and memory-constrained SSD environments. To minimize this overhead, iLSM-SSD implements Scattered Logging, which reuses invalidated Value Log pages on the Value Log. (iii) iLSM-SSD manages the data layout internally. This enables iLSM-SSD to eliminate the need for file system interactions to obtain the data layout for in-storage processing on traditional block-interface-based SSDs. We prototyped the iLSM-SSD on the Cosmos+ OpenSSD platform in a Linux environment. Extensive evaluations with synthetic benchmarks have shown that the PUT performance of iLSM-SSD is 1.6-4 times higher than that of WiscKey implemented in RocksDB.

[1]  Li-Pin Chang,et al.  KVSSD: Close integration of LSM trees and flash translation layer for write-efficient KV store , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[2]  Sungjin Lee,et al.  LightStore: Software-defined Network-attached Key-value Drives , 2019, ASPLOS.

[3]  Jason Cong,et al.  An efficient design and implementation of LSM-tree based key-value store on open-channel SSD , 2014, EuroSys '14.

[4]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[5]  Chanik Park,et al.  Enabling cost-effective data processing with smart SSD , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[6]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[7]  Bin Fan,et al.  SILT: a memory-efficient, high-performance key-value store , 2011, SOSP.

[8]  Kisik Jeong,et al.  Transaction Support using Compound Commands in Key-Value SSDs , 2019, HotStorage.

[9]  Jin Li,et al.  FlashStore , 2010, Proc. VLDB Endow..

[10]  David J. DeWitt,et al.  Query processing on smart SSDs: opportunities and challenges , 2013, SIGMOD '13.

[11]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[12]  Yannis Papakonstantinou,et al.  KAML: A Flexible, High-Performance Key-Value SSD , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[13]  Andrea C. Arpaci-Dusseau,et al.  WiscKey: Separating Keys from Values in SSD-conscious Storage , 2016, FAST.

[14]  Nisha Talagala,et al.  NVMKV: A Scalable, Lightweight, FTL-aware Key-Value Store , 2015, USENIX Annual Technical Conference.

[15]  Sungjin Lee,et al.  BlueDBM: An appliance for Big Data analytics , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[16]  Steven Swanson,et al.  Summarizer: Trading Communication with Computing Near Storage , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  Jinyoung Lee,et al.  Biscuit: A Framework for Near-Data Processing of Big Data Workloads , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[18]  Jin Li,et al.  SkimpyStash: RAM space skimpy key-value store on flash-based storage , 2011, SIGMOD '11.