Generic Design Patterns for Tunable and High-Performance SSD-based Indexes

A number of data-intensive systems require using random hash-based indexes of various forms, e.g., hashtables, Bloom filters, and locality sensitive hash tables. In this paper, we present general SSD optimization techniques that can be used to design a variety of such indexes while ensuring higher performance and easier tunability than specialized state-of-the-art approaches. We leverage two key SSD innovations: a) rearranging the data layout on the SSD to combine multiple read requests into one page read, and b) intelligent request reordering to exploit inherent parallelism in the architecture of SSDs. We build three different indexes using these techniques and conduct extensive studies showing their superior performance and flexibility.

[1]  Suman Nath,et al.  FlashDB: Dynamic Self-tuning Database for NAND Flash , 2007, 2007 6th International Symposium on Information Processing in Sensor Networks.

[2]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[3]  Joonwon Lee,et al.  Exploiting Internal Parallelism of Flash-based SSDs , 2010, IEEE Computer Architecture Letters.

[4]  Kenneth A. Ross,et al.  Buffered Bloom Filters on Solid State Storage , 2010, ADMS@VLDB.

[5]  Bin Fan,et al.  SILT: a memory-efficient, high-performance key-value store , 2011, SOSP.

[6]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[7]  Philippe Bonnet,et al.  Performing sound flash device measurements: some lessons from uFLIP , 2010, SIGMOD Conference.

[8]  Jin Li,et al.  FlashStore , 2010, Proc. VLDB Endow..

[9]  Úlfar Erlingsson,et al.  A cool and practical alternative to traditional hash tables , 2006 .

[10]  Kai Li,et al.  Image similarity search with compact data structures , 2004, CIKM '04.

[11]  Asim Kadav,et al.  Differential RAID: rethinking RAID for SSD reliability , 2010, OPSR.

[12]  Suman Nath,et al.  FlashDB: Dynamic Self-tuning Database for NAND Flash , 2007, 2007 6th International Symposium on Information Processing in Sensor Networks.

[13]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[14]  Xiaodong Zhang,et al.  Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[15]  Larry L. Peterson,et al.  HashCache: Cache Storage for the Next Billion , 2009, NSDI.

[16]  Srinivasan Seshan,et al.  A case for information-bound referencing , 2010, Hotnets-IX.

[17]  Suman Nath,et al.  Online maintenance of very large random samples on flash storage , 2009, The VLDB Journal.

[18]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[19]  Suman Nath,et al.  Cheap and Large CAMs for High Performance Data-Intensive Networked Systems , 2010, NSDI.

[20]  David Hung-Chang Du,et al.  BloomFlash: Bloom Filter on Flash-Based Storage , 2011, 2011 31st International Conference on Distributed Computing Systems.

[21]  Dimitrios Gunopulos,et al.  Microhash: an efficient index structure for fash-based sensor devices , 2005, FAST'05.

[22]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Rina Panigrahy,et al.  Design Tradeoffs for SSD Performance , 2008, USENIX ATC.

[24]  Jin Li,et al.  SkimpyStash: RAM space skimpy key-value store on flash-based storage , 2011, SIGMOD '11.

[25]  Sang-Won Lee,et al.  B+-tree Index Optimization by Exploiting Internal Parallelism of Flash-based Solid State Drives , 2011, Proc. VLDB Endow..

[26]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.