nKV: near-data processing with KV-stores on native computational storage

Massive data transfers in modern key/value stores resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-data processing (NDP) designs represent a feasible solution, which although not new, have yet to see widespread use. In this paper we introduce nKV, which is a key/value store utilizing native computational storage and near-data processing. On the one hand, nKV can directly control the data and computation placement on the underlying storage hardware. On the other hand, nKV propagates the data formats and layouts to the storage device where, software and hardware parsers and accessors are implemented. Both allow NDP operations to execute in host-intervention-free manner, directly on physical addresses and thus better utilize the underlying hardware. Our performance evaluation is based on executing traditional KV operations (GET, SCAN) and on complex graph-processing algorithms (Betweenness Centrality) in-situ, with 1.4X-2.7X better performance on real hardware - the COSMOS+ platform [22].

[1]  Chen Luo,et al.  LSM-based storage techniques: a survey , 2018, The VLDB Journal.

[2]  David A. Patterson,et al.  A case for intelligent disks (IDISKs) , 1998, SGMD.

[3]  Tobias Vinçon,et al.  On the Necessity of Explicit Cross-Layer Data Formats in Near-Data Processing Systems , 2020, 2020 IEEE 36th International Conference on Data Engineering Workshops (ICDEW).

[4]  Yang Liu,et al.  Willow: A User-Programmable SSD , 2014, OSDI.

[5]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[6]  Sangyeun Cho,et al.  YourSQL: A High-Performance Database System Leveraging In-Storage Computing , 2016, Proc. VLDB Endow..

[7]  Christos Faloutsos,et al.  Active Disks for Large-Scale Data Processing , 2001, Computer.

[8]  Ilia Petrov,et al.  NoFTL-KV: TacklingWrite-Amplification on KV-Stores with Native Storage Management , 2018, EDBT.

[9]  John Wawrzynek,et al.  Chisel: Constructing hardware in a Scala embedded language , 2012, DAC Design Automation Conference 2012.

[10]  Ilia Petrov,et al.  Native Storage Techniques for Data Management , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[11]  Javier González,et al.  LightNVM: The Linux Open-Channel SSD Subsystem , 2017, FAST.

[12]  Sungjin Lee,et al.  BlueDBM: An appliance for Big Data analytics , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[13]  Prashant J. Shenoy,et al.  Rules of thumb in data engineering , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[14]  Manos Athanassoulis,et al.  Beyond the Wall: Near-Data Processing for Databases , 2015, DaMoN.

[15]  Sang-Won Lee,et al.  In-storage processing of database scans and joins , 2016, Inf. Sci..

[16]  David J. DeWitt,et al.  Query processing on smart SSDs: opportunities and challenges , 2013, SIGMOD '13.

[17]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[18]  Chanik Park,et al.  Enabling cost-effective data processing with smart SSD , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[19]  Jinyoung Lee,et al.  Biscuit: A Framework for Near-Data Processing of Big Data Workloads , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[20]  Jungwon Kim,et al.  PapyrusKV: A High-Performance Parallel Key-Value Store for Distributed NVM Architectures , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Gustavo Alonso,et al.  Caribou: Intelligent Distributed Storage , 2017, Proc. VLDB Endow..

[22]  Joel H. Saltz,et al.  Active disks: programming model, algorithms and evaluation , 1998, ASPLOS VIII.

[23]  Rajesh Gupta,et al.  Minerva: Accelerating Data Analysis in Next-Generation SSDs , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.

[24]  Stratos Idreos,et al.  JAFAR: Near-Data Processing for Databases , 2015, SIGMOD Conference.

[25]  Christos Faloutsos,et al.  Active Storage for Large-Scale Data Mining and Multimedia , 1998, VLDB.

[26]  Gustavo Alonso,et al.  Less watts, more performance: an intelligent storage engine for data appliances , 2013, SIGMOD '13.

[27]  David J. DeWitt,et al.  Database Machines: An Idea Whose Time Passed? A Critique of the Future of Database Machines , 1989, IWDM.

[28]  Maurizio Rebaudengo,et al.  Kanzi: A Distributed, In-memory Key-Value Store , 2016, Middleware Posters and Demos.

[29]  U. Brandes A faster algorithm for betweenness centrality , 2001 .