Code Generation Techniques for Raw Data Processing

The motivation of the current study was to design an algorithm that can speed up the processing of a query. The important feature is generating code dynamically for a specific query.We present the technique of code generation that is applied to query processing on a raw file. The idea was to customize a query program with a given query and generate a machine- and query-specific source code. The generated code is compiled by GCC, Clang or any other C/C++ compiler, and the compiled file is dynamically linked to the main program for further processing. Code generation reduces the cost of generalizing query processing. It also avoids the overhead of the conventional interpretation during achieve high performance.

[1]  Anastasia Ailamaki,et al.  NoDB: efficient query execution on raw data files , 2015, Commun. ACM.

[2]  Yu Cheng,et al.  Parallel in-situ data processing with speculative loading , 2014, SIGMOD Conference.

[3]  Richard T. Snodgrass,et al.  Application of Micro-specialization to Query Evaluation Operators , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[4]  Stratis Viglas,et al.  Generating code for holistic query evaluation , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[5]  Richard T. Snodgrass,et al.  Micro-specialization: dynamic code specialization of database management systems , 2012, CGO '12.

[6]  Craig Freedman,et al.  Compilation in the Microsoft SQL Server Hekaton Engine , 2014, IEEE Data Eng. Bull..

[7]  Thomas Neumann,et al.  Efficiently Compiling Efficient Query Plans for Modern Hardware , 2011, Proc. VLDB Endow..