Design patterns for tunable and efficient SSD-based indexes

A number of data-intensive systems require using random hash-based indexes of various forms, e.g., hash tables, Bloom filters, and locality sensitive hash tables. In this paper, we present general SSD optimization techniques that can be used to design a variety of such indexes while ensuring higher performance and easier tunability than specialized state-of-the-art approaches. We leverage two key SSD innovations: a) rearranging the data layout on the SSD to combine multiple read requests into one page read, and b) intelligently reordering requests to exploit inherent parallelism in the architecture of SSDs. We build three different indexes using these techniques, and we conduct extensive studies showing their superior performance, lower CPU/memory footprint, and tunability compared to state-of-the-art systems.

[1]  Rina Panigrahy,et al.  Design Tradeoffs for SSD Performance , 2008, USENIX ATC.

[2]  Srinivasan Seshan,et al.  Packet caches on routers: the implications of universal redundant traffic elimination , 2008, SIGCOMM '08.

[3]  Suman Nath,et al.  Online maintenance of very large random samples on flash storage , 2009, The VLDB Journal.

[4]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[5]  Sang-Won Lee,et al.  B+-tree Index Optimization by Exploiting Internal Parallelism of Flash-based Solid State Drives , 2011, Proc. VLDB Endow..

[6]  Jin Li,et al.  FlashStore , 2010, Proc. VLDB Endow..

[7]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[8]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[9]  Suman Nath,et al.  Building Cheap and Large CAMs Using BufferHash , 2009 .

[10]  References , 1971 .

[11]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[12]  Srinivasan Seshan,et al.  A case for information-bound referencing , 2010, Hotnets-IX.

[13]  David Hung-Chang Du,et al.  BloomFlash: Bloom Filter on Flash-Based Storage , 2011, 2011 31st International Conference on Distributed Computing Systems.

[14]  Úlfar Erlingsson,et al.  A cool and practical alternative to traditional hash tables , 2006 .

[15]  Kai Li,et al.  Image similarity search with compact data structures , 2004, CIKM '04.

[16]  Kenneth A. Ross,et al.  Buffered Bloom Filters on Solid State Storage , 2010, ADMS@VLDB.

[17]  Asim Kadav,et al.  Differential RAID: rethinking RAID for SSD reliability , 2010, OPSR.

[18]  Jin Li,et al.  SkimpyStash: RAM space skimpy key-value store on flash-based storage , 2011, SIGMOD '11.

[19]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Larry L. Peterson,et al.  HashCache: Cache Storage for the Next Billion , 2009, NSDI.

[21]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[22]  Bin Fan,et al.  SILT: a memory-efficient, high-performance key-value store , 2011, SOSP.

[23]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[24]  Suman Nath,et al.  Cheap and Large CAMs for High Performance Data-Intensive Networked Systems , 2010, NSDI.

[25]  Xiaodong Zhang,et al.  Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.