Programming an SSD Controller to Support Batched Writes for Variable-Size Pages

Exploiting a storage hierarchy is critical to cost-effective data management. However, most systems are challenged when data is not in cache because of the additional I/O to move data between SSD and main memory. To improve both cost and performance, some systems use a log structured store to write a batch of pages instead of a "block-at-a-time". However, host-based log structuring incurs the additional cost and complexity of garbage collection and recovery, duplicating similar SSD FTL functionality. In prior work, we presented a customized SSD controller implementation for an Open-Channel SSD to enable host computers to write batches of fixed size pages. This current work is a major redesign to support a batched write interface with variable size pages. Variable size pages can enable easy support of data compression and encryption, as well as reducing internal page storage fragmentation, e.g, within a B-tree. Thus it further improves I/O performance while making it easier and more efficient to support these capabilities.

[1]  Javier González,et al.  LightNVM: The Linux Open-Channel SSD Subsystem , 2017, FAST.

[2]  Jaeyoung Do,et al.  Programmable solid-state storage in future cloud datacenters , 2019, Commun. ACM.

[3]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[4]  Michael Cornwell,et al.  Anatomy of a solid-state drive , 2012, CACM.

[5]  Jaeyoung Do,et al.  Better database cost/performance via batched I/O on programmable SSD , 2021, The VLDB Journal.

[6]  SwansonSteven,et al.  Programmable solid-state storage in future cloud datacenters , 2019 .

[7]  Yang Liu,et al.  Willow: A User-Programmable SSD , 2014, OSDI.

[8]  Andreas Reuter,et al.  Principles of transaction-oriented database recovery , 1983, CSUR.

[9]  Chen Li,et al.  AsterixDB: A Scalable, Open Source BDMS , 2014, Proc. VLDB Endow..

[10]  David Lomet,et al.  Efficiently Reclaiming Space in a Log Structured Store , 2020, 2021 IEEE 37th International Conference on Data Engineering (ICDE).

[11]  David B. Lomet,et al.  Cost/Performance in Modern Data Stores: How Data Caching Systems Succeed , 2018, 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW).

[12]  Sudipta Sengupta,et al.  LLAMA: A Cache/Storage Subsystem for Modern Hardware , 2013, Proc. VLDB Endow..

[13]  Sudipta Sengupta,et al.  The Bw-Tree: A B-tree for new hardware platforms , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[14]  David J. DeWitt,et al.  Query processing on smart SSDs: opportunities and challenges , 2013, SIGMOD '13.

[15]  Jaeyoung Do,et al.  Improving CPU I/O Performance via SSD Controller FTL Support for Batched Writes , 2019, DaMoN.

[16]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[17]  Jason Cong,et al.  An efficient design and implementation of LSM-tree based key-value store on open-channel SSD , 2014, EuroSys '14.

[18]  Jinyoung Lee,et al.  Biscuit: A Framework for Near-Data Processing of Big Data Workloads , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).