Revisiting Database Storage Optimizations on Flash

The database storage hierarchy has been heavily optimized for the performance characteristics of disks. Storage managers typically employ rowor column-oriented storage layouts, or a combination, to improve the I/O performance of different query workloads with disks. The recent rise of flash memory-based solid-state drives (SSDs) significantly change the performance characteristics of storage: these drives provide an order of magnitude lower read/access latencies, significantly higher read bandwidths, and most importantly, negligible seek overheads. In light of these differences, we analyze major storage optimizations for read-optimized databases. We examine the benefits of row and column-oriented storage layouts on flash SSDs. Our measurments span through different workload variations, including selectivity, projectivity and concurrency that affect query processing on flash. Further, we also investigate the cost and benefits of a set of database optimizations, including data compression, prefetching, and indexes on flash SSDs. Our analytical models back our experimental evaluation of the performance tradeoffs of these optimizations. Three of our key findings are: (1) SSDs scale up linearly with concurrent execution of database queries and outperform disks by up to a factor of two, (2) the low seek cost on SSDs makes columnstorage a better choice for laying out data on a variety of flash devices, (3) and that while data compression is useful to further leverage the bandwidth of flash, database prefetching has less benefit for flash storage. Finally, we present a list of design implications of our findings on future database and operating systems for effectively embracing flash storage.

[1]  Daniel J. Abadi,et al.  Performance tradeoffs in read-optimized databases , 2006, VLDB.

[2]  Daniel J. Abadi,et al.  Column-stores vs. row-stores: how different are they really? , 2008, SIGMOD Conference.

[3]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[4]  Jonathan Goldstein,et al.  Compressing relations and indexes , 1998, Proceedings 14th International Conference on Data Engineering.

[5]  Anastasia Ailamaki,et al.  QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.

[6]  Goetz Graefe,et al.  Fast scans and joins using flash drives , 2008, DaMoN '08.

[7]  Christopher Small,et al.  Why does file system prefetching work? , 1999, USENIX Annual Technical Conference, General Track.

[8]  Rina Panigrahy,et al.  Design Tradeoffs for SSD Performance , 2008, USENIX ATC.

[9]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[10]  Goetz Graefe,et al.  Query processing techniques for solid state drives , 2009, SIGMOD Conference.

[11]  Sven Helmer,et al.  The implementation and performance of compressed databases , 2000, SGMD.

[12]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[13]  David J. DeWitt,et al.  A Comparison of C-Store and Row-Store in a Common Framework , 2006 .

[14]  Peter Druschel,et al.  Anticipatory scheduling: a disk scheduling framework to overcome deceptive idleness in synchronous I/O , 2001, SOSP.

[15]  Bingsheng He,et al.  Tree Indexing on Flash Disks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[16]  Hyojun Kim,et al.  BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage , 2008, FAST.

[17]  Philippe Bonnet,et al.  uFLIP: Understanding Flash IO Patterns , 2009, CIDR.

[18]  H KatzRandy,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988 .

[19]  Daniel S. Myers,et al.  On the use of NAND flash memory in high-performance relational databases , 2008 .

[20]  Tei-Wei Kuo,et al.  An efficient R-tree implementation over flash-memory storage systems , 2003, GIS '03.

[21]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[22]  David J. DeWitt,et al.  Read-optimized databases, in depth , 2008, Proc. VLDB Endow..

[23]  Dimitrios Gunopulos,et al.  Microhash: an efficient index structure for fash-based sensor devices , 2005, FAST'05.

[24]  Jignesh M. Patel,et al.  Data Morphing: An Adaptive, Cache-Conscious Storage Technique , 2003, VLDB.

[25]  Xiaodong Zhang,et al.  Understanding intrinsic characteristics and system implications of flash memory based solid state drives , 2009, SIGMETRICS '09.

[26]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.