From Flash to 3D XPoint: Performance Bottlenecks and Potentials in RocksDB with Storage Evolution

Storage technologies have undergone continuous innovations in the past decade. The latest technical advancement in this domain is 3D XPoint memory. As a type of Non-volatile Memory (NVM), 3D XPoint memory promises great improvement in performance, density, and endurance over NAND flash memory. Compared to flash based SSDs, 3D XPoint based SSDs, such as Intel's Optane SSD, can deliver unprecedented low latency and high throughput. These properties are particularly appealing to I/O intensive applications. Key-value store is such an important application in data center systems. This paper presents the first, in-depth performance study on the impact of the aforesaid storage hardware evolution to RocksDB, a highly popular key-value store based on Log-structured Merge tree (LSM-tree). We have conducted extensive experiments for quantitative measurements on three types of SSD devices. Besides confirming the performance gain of RocksDB on 3D XPoint SSD, our study also reveals several unexpected bottlenecks in the current key-value store design, which hinder us from fully exploiting the great performance potential of the new storage hardware. Based on our findings, we also present three exemplary case studies to showcase the efficacy of removing these bottlenecks with simple methods, achieving a performance improvement by up to 18.8%. We further discuss the implications of our findings for system designers and users to develop schemes in future optimizations. Our study shows that many of the current LSM-tree based key-value store designs need to be carefully revisited to effectively incorporate the new-generation hardware for realizing high-speed data processing.

[1]  Song Jiang,et al.  LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items , 2015, USENIX Annual Technical Conference.

[2]  Raghu Ramakrishnan,et al.  bLSM: a general purpose log structured merge tree , 2012, SIGMOD Conference.

[3]  Erez Zadok,et al.  Building workload-independent storage with VT-trees , 2013, FAST.

[4]  Xiaodong Zhang,et al.  Understanding intrinsic characteristics and system implications of flash memory based solid state drives , 2009, SIGMETRICS '09.

[5]  Bharath Ramsundar,et al.  NVMKV: A Scalable and Lightweight Flash Aware Key-Value Store , 2014, HotStorage.

[6]  Arif Merchant,et al.  Flash Reliability in Production: The Expected and the Unexpected , 2016, FAST.

[7]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[8]  Dan Williams,et al.  Platform Storage Performance With 3D XPoint Technology , 2017, Proceedings of the IEEE.

[9]  Andrea C. Arpaci-Dusseau,et al.  Redesigning LSMs for Nonvolatile Memory with NoveLSM , 2018, USENIX Annual Technical Conference.

[10]  Idit Keidar,et al.  Scaling concurrent log-structured data stores , 2015, EuroSys.

[11]  Bin Fan,et al.  SILT: a memory-efficient, high-performance key-value store , 2011, SOSP.

[12]  Gang Chen,et al.  LogBase: A Scalable Log-structured Database System in the Cloud , 2012, Proc. VLDB Endow..

[13]  Ittai Abraham,et al.  PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees , 2017, SOSP.

[14]  Austin Donnelly,et al.  Sierra: practical power-proportionality for data center storage , 2011, EuroSys '11.

[15]  Andrea C. Arpaci-Dusseau,et al.  WiscKey: Separating Keys from Values in SSD-conscious Storage , 2016, FAST.

[16]  Robert B. Ross,et al.  Lightweight Provenance Service for High-Performance Computing , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[17]  André Brinkmann,et al.  GekkoFS - A Temporary Distributed File System for HPC Applications , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[18]  Hyeontaek Lim,et al.  Towards Accurate and Fast Evaluation of Multi-Stage Log-structured Designs , 2016, FAST.

[19]  Jason Cong,et al.  An efficient design and implementation of LSM-tree based key-value store on open-channel SSD , 2014, EuroSys '14.

[20]  Il Han Park,et al.  A 512-Gb 3-b/Cell 64-Stacked WL 3-D-NAND Flash Memory , 2018, IEEE Journal of Solid-State Circuits.

[21]  Robert B. Ross,et al.  On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[22]  Xiaodong Zhang,et al.  Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[23]  Gang Wang,et al.  Performance Analysis of 3D XPoint SSDs in Virtualized and Non-Virtualized Environments , 2018, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).

[24]  Erez Zadok,et al.  An efficient multi-tier tablet server storage architecture , 2011, SoCC.

[25]  Rina Panigrahy,et al.  Design Tradeoffs for SSD Performance , 2008, USENIX ATC.

[26]  Jinkyu Jeong,et al.  Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs , 2019, USENIX Annual Technical Conference.

[27]  Lei Guo,et al.  LSbM-tree: Re-Enabling Buffer Caching in Data Management for Mixed Reads and Writes , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).