On the Necessity of Explicit Cross-Layer Data Formats in Near-Data Processing Systems

Massive data transfers in modern data-intensive systems resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-data processing (NDP) and a shift to code-to-data designs may represent a viable solution as packaging combinations of storage and compute elements on the same device has become viable.The shift towards NDP system architectures calls for revision of established principles. Abstractions such as data formats and layouts typically spread multiple layers in traditional DBMS, the way they are processed is encapsulated within these layers of abstraction. The NDP-style processing requires an explicit definition of cross-layer data formats and accessors to ensure in-situ executions optimally utilizing the properties of the underlying NDP storage and compute elements. In this paper, we make the case for such data format definitions and investigate the performance benefits under NoFTL-KV and the COSMOS hardware platform.

[1]  David A. Patterson,et al.  A case for intelligent disks (IDISKs) , 1998, SGMD.

[2]  David J. DeWitt,et al.  Database Machines: An Idea Whose Time Passed? A Critique of the Future of Database Machines , 1989, IWDM.

[3]  Ilia Petrov,et al.  IPA-IDX: In-Place Appends for B-Tree Indices , 2019, DaMoN.

[4]  Christos Faloutsos,et al.  Active Storage for Large-Scale Data Mining and Multimedia , 1998, VLDB.

[5]  Chanik Park,et al.  Enabling cost-effective data processing with smart SSD , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[6]  Gustavo Alonso,et al.  Caribou: Intelligent Distributed Storage , 2017, Proc. VLDB Endow..

[7]  Manos Athanassoulis,et al.  Beyond the Wall: Near-Data Processing for Databases , 2015, DaMoN.

[8]  Sang-Won Lee,et al.  In-storage processing of database scans and joins , 2016, Inf. Sci..

[9]  David J. DeWitt,et al.  Query processing on smart SSDs: opportunities and challenges , 2013, SIGMOD '13.

[10]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[11]  Gustavo Alonso,et al.  Less watts, more performance: an intelligent storage engine for data appliances , 2013, SIGMOD '13.

[12]  Joel H. Saltz,et al.  Active disks: programming model, algorithms and evaluation , 1998, ASPLOS VIII.

[13]  Ali R. Hurson,et al.  Parallel Architectures for Database Systems , 1989, Adv. Comput..

[14]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[15]  Rajesh Gupta,et al.  Minerva: Accelerating Data Analysis in Next-Generation SSDs , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.

[16]  Ilia Petrov,et al.  From In-Place Updates to In-Place Appends: Revisiting Out-of-Place Updates on Flash , 2017, SIGMOD Conference.

[17]  Jungwon Kim,et al.  PapyrusKV: A High-Performance Parallel Key-Value Store for Distributed NVM Architectures , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Sungjin Lee,et al.  BlueDBM: An appliance for Big Data analytics , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[19]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[20]  Stratos Idreos,et al.  JAFAR: Near-Data Processing for Databases , 2015, SIGMOD Conference.

[21]  Sam Lightstone,et al.  DB2 with BLU Acceleration: So Much More than Just a Column Store , 2013, Proc. VLDB Endow..

[22]  Sangyeun Cho,et al.  YourSQL: A High-Performance Database System Leveraging In-Storage Computing , 2016, Proc. VLDB Endow..

[23]  Jinyoung Lee,et al.  Biscuit: A Framework for Near-Data Processing of Big Data Workloads , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[24]  Alfons Kemper,et al.  Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation , 2016, SIGMOD Conference.

[25]  Ilia Petrov,et al.  NoFTL-KV: TacklingWrite-Amplification on KV-Stores with Native Storage Management , 2018, EDBT.

[26]  Yang Liu,et al.  Willow: A User-Programmable SSD , 2014, OSDI.