Functional pearl: a SQL to C compiler in 500 lines of code

We present the design and implementation of a SQL query processor that outperforms existing database systems and is written in just about 500 lines of Scala code -- a convincing case study that high-level functional programming can handily beat C for systems-level programming where the last drop of performance matters. The key enabler is a shift in perspective towards generative programming. The core of the query engine is an interpreter for relational algebra operations, written in Scala. Using the open-source LMS Framework (Lightweight Modular Staging), we turn this interpreter into a query compiler with very low effort. To do so, we capitalize on an old and widely known result from partial evaluation known as Futamura projections, which state that a program that can specialize an interpreter to any given input program is equivalent to a compiler. In this pearl, we discuss LMS programming patterns such as mixed-stage data structures (e.g. data records with static schema and dynamic field components) and techniques to generate low-level C code, including specialized data structures and data loading primitives.

[1]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[2]  Goetz Graefe,et al.  Volcano - An Extensible and Parallel Query Evaluation System , 1994, IEEE Trans. Knowl. Data Eng..

[3]  Trevor L. McDonell Optimising purely functional GPU programs , 2013, ICFP.

[4]  John C. Reynolds,et al.  Definitional Interpreters for Higher-Order Programming Languages , 1972, ACM '72.

[5]  Jan Vitek,et al.  Terra: a multi-stage language for high-performance computing , 2013, PLDI.

[6]  Sam Tobin-Hochstadt,et al.  Languages as libraries , 2011, PLDI '11.

[7]  Emil Axelsson,et al.  Combining Deep and Shallow Embedding for EDSL , 2012, Trends in Functional Programming.

[8]  John C. Reynolds Definitional Interpreters for Higher-Order Programming Languages , 1998, High. Order Symb. Comput..

[9]  William L. Scherlis,et al.  Compilers and staging transformations , 1986, POPL '86.

[10]  Kunle Olukotun,et al.  Go Meta! A Case for Generative Programming and DSLs in Performance Critical Systems , 2015, SNAPL.

[11]  Martin Odersky,et al.  Higher-order and Symbolic Computation Manuscript No. Scala-virtualized: Linguistic Reuse for Deep Embeddings , 2022 .

[12]  Olivier Danvy,et al.  Tutorial notes on partial evaluation , 1993, POPL '93.

[13]  Peter Sestoft,et al.  Partial evaluation and automatic program generation , 1993, Prentice Hall international series in computer science.

[14]  Marcin Zukowski,et al.  MonetDB/X100 - A DBMS In The CPU Cache , 2005, IEEE Data Eng. Bull..

[15]  J. Gregory Morrisett,et al.  Nikola: embedding compiled GPU functions in Haskell , 2010, Haskell '10.

[16]  Yoshihiko Futamura Partial Evaluation of Computation Process, Revisited , 1999, High. Order Symb. Comput..

[17]  Michael Stonebraker,et al.  "One Size Fits All": An Idea Whose Time Has Come and Gone (Abstract) , 2005, ICDE.

[18]  Martin Odersky,et al.  Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.

[19]  Pat Hanrahan,et al.  First-class runtime generation of high-performance types using exotypes , 2014, PLDI.

[20]  Mary Sheeran,et al.  The Design and Implementation of Feldspar - An Embedded Language for Digital Signal Processing , 2010, IFL.

[21]  Thomas Neumann,et al.  Efficiently Compiling Efficient Query Plans for Modern Hardware , 2011, Proc. VLDB Endow..

[22]  Kunle Olukotun,et al.  Optimizing data structures in high-level programs: new directions for extensible compilers based on staging , 2013, POPL.

[23]  Walid Taha,et al.  MetaML and multi-stage programming with explicit annotations , 2000, Theor. Comput. Sci..

[24]  Kurt Keutzer,et al.  Copperhead: compiling an embedded data parallel language , 2011, PPoPP '11.