Optimizing Databases by Learning Hidden Parameters of Solid State Drives

Solid State Drives (SSDs) are complex devices with varying internal implementations, resulting in subtle differences in behavior between devices. In this paper, we demonstrate how a database engine can be optimized for a particular device by learning its hidden parameters. This can not only improve an application’s performance, but also potentially increase the lifetime of the SSD. Our approach for optimizing a database for a given SSD consists of three steps: learning the hidden parameters of the device, proposing rules to analyze the I/O behavior of the database, and optimizing the database by eliminating violations of these rules. We obtain two different characteristics of an SSD, namely the request size profile and the location profile, from which we learn multiple internal parameters. Based on these parameters, we propose rules to analyze the I/O behavior of a database engine. Using these rules, we uncover sub-optimal I/O patterns in SQLite3 and MariaDB when running on our experimental SSDs. Finally, we present three techniques to optimize these database engines: (1) use-hot-locations on SSD-S, which improves the SELECT operation throughput of SQLite3 and MariaDB by 29% and 27% respectively; it also improves the performance of YCSB on MariaDB by 1%-22% depending on the workload mix, (2) write-alignedstripes on SSD-T, reduces the wear-out caused by SQLite3 write-ahead log (WAL) file by 3.1%, and (3) contain-writein-flash-page on SSD-T, which reduces the wear-out caused by the MariaDB binary log file by 6.7%. PVLDB Reference Format: Aarati Kakaraparthy, Jignesh M. Patel, Kwanghyun Park, and Brian P. Kroth. Optimizing Databases by Learning Hidden Parameters of Solid State Drives. PVLDB, 13(4): 519–532, 2019. DOI: https://doi.org/10.14778/3372716.3372724

[1]  Steven Swanson,et al.  Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications , 2009, ASPLOS.

[2]  Jin-Soo Kim,et al.  Parameter-Aware I/O Management for Solid State Disks (SSDs) , 2012, IEEE Transactions on Computers.

[3]  Bruce Jacob,et al.  The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization , 2009, ISCA '09.

[4]  Youngjae Kim,et al.  DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings , 2009, ASPLOS.

[5]  Rina Panigrahy,et al.  Design Tradeoffs for SSD Performance , 2008, USENIX ATC.

[6]  Chao Li,et al.  AB-Tree: A Write-Optimized Adaptive Index Structure on Solid State Disk , 2014, 2014 11th Web Information System and Application Conference.

[7]  Sang-Won Lee,et al.  B+-tree Index Optimization by Exploiting Internal Parallelism of Flash-based Solid State Drives , 2011, Proc. VLDB Endow..

[8]  JacobBruce,et al.  The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization , 2009 .

[9]  Xiaodong Zhang,et al.  Understanding intrinsic characteristics and system implications of flash memory based solid state drives , 2009, SIGMETRICS '09.

[10]  Hong Jiang,et al.  Exploring and Exploiting the Multilevel Parallelism Inside SSDs for Improved Performance and Endurance , 2013, IEEE Transactions on Computers.

[11]  Philippe Bonnet,et al.  uFLIP-OC: Understanding Flash I/O Patterns on Open-Channel Solid-State Drives , 2017, APSys.

[12]  Suman Nath,et al.  FlashDB: Dynamic Self-tuning Database for NAND Flash , 2007, 2007 6th International Symposium on Information Processing in Sensor Networks.

[13]  Sang-Won Lee,et al.  A survey of Flash Translation Layer , 2009, J. Syst. Archit..

[14]  Heeseung Jo,et al.  A superblock-based flash translation layer for NAND flash memory , 2006, EMSOFT '06.

[15]  Bingsheng He,et al.  Tree indexing on solid state drives , 2010, Proc. VLDB Endow..

[16]  Vijay Chidambaram,et al.  The Dangers and Complexities of SQLite Benchmarking , 2017, APSys.

[17]  Ramesh K. Sitaraman,et al.  Lazy-Adaptive Tree: An Optimized Index Structure for Flash Devices , 2009, Proc. VLDB Endow..

[18]  Xiaodong Zhang,et al.  Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[19]  Qiang Wu,et al.  A Large-Scale Study of Flash Memory Failures in the Field , 2015, SIGMETRICS 2015.

[20]  Andrea C. Arpaci-Dusseau,et al.  The Unwritten Contract of Solid State Drives , 2017, EuroSys.

[21]  Tei-Wei Kuo,et al.  An efficient B-tree layer implementation for flash-memory storage systems , 2007, TECS.

[22]  KimJin-Soo,et al.  A reconfigurable FTL (flash translation layer) architecture for NAND flash-based applications , 2008 .

[23]  Dan Williams,et al.  Platform Storage Performance With 3D XPoint Technology , 2017, Proceedings of the IEEE.

[24]  Kuang-Ching Wang,et al.  The Design and Operation of CloudLab , 2019, USENIX ATC.

[25]  Evangelos Eleftheriou,et al.  Write amplification analysis in flash-based solid state drives , 2009, SYSTOR '09.

[26]  H. Howie Huang,et al.  Flashy prefetching for high-performance flash drives , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[27]  Javier González,et al.  LightNVM: The Linux Open-Channel SSD Subsystem , 2017, FAST.

[28]  Peng Li,et al.  Improving Service Availability of Cloud Systems by Predicting Disk Error , 2018, USENIX ATC.