WiscKey: Separating Keys from Values in SSD-conscious Storage

We present WiscKey, a persistent LSM-tree-based key-value store with a performance-oriented data layout that separates keys from values to minimize I/O amplification. The design of WiscKey is highly SSD optimized, leveraging both the sequential and random performance characteristics of the device. We demonstrate the advantages of WiscKey with both microbenchmarks and YCSB workloads. Microbenchmark results show that WiscKey is 2.5×-111× faster than LevelDB for loading a database and 1.6×-14× faster for random lookups. WiscKey is faster than both LevelDB and RocksDB in all six YCSB workloads.

[1]  David B. Lomet,et al.  AlphaSort: a RISC machine sort , 1994, SIGMOD '94.

[2]  Suresh Venkatasubramanian,et al.  On external memory graph traversal , 2000, SODA '00.

[3]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[4]  Michael A. Bender,et al.  Cache-oblivious streaming B-trees , 2007, SPAA '07.

[5]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[6]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[7]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[8]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[9]  Sanjeev Kumar,et al.  Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.

[10]  Suman Nath,et al.  Cheap and Large CAMs for High Performance Data-Intensive Networked Systems , 2010, NSDI.

[11]  Rajesh K. Gupta,et al.  Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[12]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[13]  Jin Li,et al.  FlashStore , 2010, Proc. VLDB Endow..

[14]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[15]  Lars George,et al.  HBase: The Definitive Guide , 2011 .

[16]  Bin Fan,et al.  SILT: a memory-efficient, high-performance key-value store , 2011, SOSP.

[17]  Jin Li,et al.  SkimpyStash: RAM space skimpy key-value store on flash-based storage , 2011, SIGMOD '11.

[18]  Xiaodong Zhang,et al.  Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[19]  Cristian Ungureanu,et al.  Revisiting storage for smartphones , 2012, TOS.

[20]  Lei Gao,et al.  Serving large-scale batch computed data with project Voldemort , 2012, FAST.

[21]  Chris Douglas,et al.  Walnut: a unified cloud object store , 2012, SIGMOD Conference.

[22]  David G. Andersen,et al.  Using vector interfaces to deliver millions of IOPS from a networked key-value storage server , 2012, SoCC '12.

[23]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[24]  Sang-Won Lee,et al.  SFS: random write considered harmful in solid state drives , 2012, FAST.

[25]  Raghu Ramakrishnan,et al.  bLSM: a general purpose log structured merge tree , 2012, SIGMOD Conference.

[26]  Eddie Kohler,et al.  Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[27]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[28]  Erez Zadok,et al.  Building workload-independent storage with VT-trees , 2013, FAST.

[29]  Bin Fan,et al.  MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing , 2013, NSDI.

[30]  Timothy G. Armstrong,et al.  LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.

[31]  Kai Ren,et al.  TABLEFS: Enhancing Metadata Efficiency in the Local File System , 2013, USENIX Annual Technical Conference.

[32]  Timothy Roscoe,et al.  Arrakis , 2014, OSDI.

[33]  Kai Ren,et al.  IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[34]  Jason Cong,et al.  An efficient design and implementation of LSM-tree based key-value store on open-channel SSD , 2014, EuroSys '14.

[35]  Andrea C. Arpaci-Dusseau,et al.  All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications , 2014, OSDI.

[36]  Andrea C. Arpaci-Dusseau,et al.  Analysis of HDFS under HBase: a facebook messages case study , 2014, FAST.

[37]  Hyeontaek Lim,et al.  MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[38]  Joo Young Hwang,et al.  F2FS: A New File System for Flash Storage , 2015, FAST.

[39]  Michael A. Bender,et al.  BetrFS: A Right-Optimized Write-Optimized File System , 2015, FAST.

[40]  Nisha Talagala,et al.  NVMKV: A Scalable, Lightweight, FTL-aware Key-Value Store , 2015, USENIX Annual Technical Conference.

[41]  Song Jiang,et al.  LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items , 2015, USENIX Annual Technical Conference.

[42]  Ethan L. Miller,et al.  Purity: Building Fast, Highly-Available Enterprise Flash Storage from Commodity Components , 2015, SIGMOD Conference.

[43]  Idit Keidar,et al.  Scaling concurrent log-structured data stores , 2015, EuroSys.

[44]  Jason Cong,et al.  Atlas: Baidu's key-value storage system for cloud data , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[45]  Anastasia Ailamaki,et al.  Designing Access Methods: The RUM Conjecture , 2016, EDBT.

[46]  Andrea C. Arpaci-Dusseau,et al.  Correlated Crash Vulnerabilities , 2016, OSDI.

[47]  Jin-Soo Kim,et al.  ForestDB: A Fast Key-Value Storage System for Variable-Length String Keys , 2016, IEEE Transactions on Computers.

[48]  Remzi H. Arpaci-Dusseau Operating Systems: Three Easy Pieces , 2015, login Usenix Mag..