On the necessity of explicit cross-layer data formats in near-data processing systems

Massive data transfers in modern data-intensive systems resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-Data processing (NDP) and a shift to code-to-data designs may represent a viable solution as packaging combinations of storage and compute elements on the same device has become feasible. The shift towards NDP system architectures calls for revision of established principles. Abstractions such as data formats and layouts typically spread multiple layers in traditional DBMS, the way they are processed is encapsulated within these layers of abstraction. The NDP-style processing requires an explicit definition of cross-layer data formats and accessors to ensure in-situ executions optimally utilizing the properties of the underlying NDP storage and compute elements. In this paper, we make the case for such data format definitions and investigate the performance benefits under RocksDB and the COSMOS hardware platform.

[1]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[2]  Gustavo Alonso,et al.  Less watts, more performance: an intelligent storage engine for data appliances , 2013, SIGMOD '13.

[3]  David J. DeWitt,et al.  Query processing on smart SSDs: opportunities and challenges , 2013, SIGMOD '13.

[4]  David J. DeWitt,et al.  Database Machines: An Idea Whose Time Passed? A Critique of the Future of Database Machines , 1989, IWDM.

[5]  Stratos Idreos,et al.  JAFAR: Near-Data Processing for Databases , 2015, SIGMOD Conference.

[6]  Joel H. Saltz,et al.  Active disks: programming model, algorithms and evaluation , 1998, ASPLOS VIII.

[7]  Ilia Petrov,et al.  From In-Place Updates to In-Place Appends: Revisiting Out-of-Place Updates on Flash , 2017, SIGMOD Conference.

[8]  Sungjin Lee,et al.  BlueDBM: An appliance for Big Data analytics , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[9]  Sangyeun Cho,et al.  YourSQL: A High-Performance Database System Leveraging In-Storage Computing , 2016, Proc. VLDB Endow..

[10]  Chanik Park,et al.  Enabling cost-effective data processing with smart SSD , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[11]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[12]  Ilia Petrov,et al.  A hybrid page layout integrating PAX and NSM , 2013, IDEAS '13.

[13]  Yang Liu,et al.  Willow: A User-Programmable SSD , 2014, OSDI.

[14]  Rajesh Gupta,et al.  Minerva: Accelerating Data Analysis in Next-Generation SSDs , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.

[15]  Maurizio Rebaudengo,et al.  Kanzi: A Distributed, In-memory Key-Value Store , 2016, Middleware Posters and Demos.

[16]  Alfons Kemper,et al.  Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation , 2016, SIGMOD Conference.

[17]  Manos Athanassoulis,et al.  Beyond the Wall: Near-Data Processing for Databases , 2015, DaMoN.

[18]  Ilia Petrov,et al.  nKV: near-data processing with KV-stores on native computational storage , 2020, DaMoN.

[19]  Ilia Petrov,et al.  NoFTL-KV: TacklingWrite-Amplification on KV-Stores with Native Storage Management , 2018, EDBT.

[20]  Jinyoung Lee,et al.  Biscuit: A Framework for Near-Data Processing of Big Data Workloads , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[21]  Sam Lightstone,et al.  DB2 with BLU Acceleration: So Much More than Just a Column Store , 2013, Proc. VLDB Endow..

[22]  Christos Faloutsos,et al.  Active Storage for Large-Scale Data Mining and Multimedia , 1998, VLDB.

[23]  Gustavo Alonso,et al.  Caribou: Intelligent Distributed Storage , 2017, Proc. VLDB Endow..

[24]  Jungwon Kim,et al.  PapyrusKV: A High-Performance Parallel Key-Value Store for Distributed NVM Architectures , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[25]  David A. Patterson,et al.  A case for intelligent disks (IDISKs) , 1998, SGMD.

[26]  Hui Zhang,et al.  SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD , 2020, IEEE Computer Architecture Letters.

[27]  Sang-Won Lee,et al.  In-storage processing of database scans and joins , 2016, Inf. Sci..

[28]  Zhiwei Xu,et al.  RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems , 2011, 2011 IEEE 27th International Conference on Data Engineering.