论文信息 - Towards Efficient Multi-domain Data Processing

Towards Efficient Multi-domain Data Processing

Economy and research increasingly depend on the timely analysis of large datasets to guide decision making. Complex analysis often involve a rich variety of data types and special purpose processing models. We believe, the database system of the future will use compilation techniques to translate specialized and abstract high level programming models into scalable low level operations on efficient physical data formats. We currently envision optimized relational and linear algebra languages, a flexible data flow language(A language inspired by the programming models of popular data flow engines like Apache Spark (spark.apache.org) or Apache Flink (flink.apache.org).) and scaleable physical operators and formats for relational and array data types. In this article, we propose a database system architecture that is designed around these ideas and we introduce our prototypical implementation of that architecture.

[1] Dirk Habich,et al. Advancing a Gateway Infrastructure for Wind Turbine Data Analysis , 2016, Journal of Grid Computing.

[2] Michael D. McCool,et al. Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[3] Volker Markl,et al. Implicit Parallelism through Deep Language Embedding , 2015, SIGMOD Conference.

[4] Paul H. J. Kelly,et al. Runtime Code Generation in C++ as a Foundation for Domain-Specific Optimisation , 2003, Domain-Specific Program Generation.

[5] Wolfgang Lehner,et al. Architecture of a Multi-domain Processing and Storage Engine , 2016, DATA.

[6] Michael Stonebraker,et al. The BigDAWG Polystore System , 2015, SGMD.

[7] Martin Odersky,et al. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.

[8] OderskyMartin,et al. Lightweight modular staging , 2010 .