Forge: generating a high performance DSL implementation from a declarative specification

Domain-specific languages provide a promising path to automatically compile high-level code to parallel, heterogeneous, and distributed hardware. However, in practice high performance DSLs still require considerable software expertise to develop and force users into tool-chains that hinder prototyping and debugging. To address these problems, we present Forge, a new meta DSL for declaratively specifying high performance embedded DSLs. Forge provides DSL authors with high-level abstractions (e.g., data structures, parallel patterns, effects) for specifying their DSL in a way that permits high performance. From this high-level specification, Forge automatically generates both a naïve Scala library implementation of the DSL and a high performance version using the Delite DSL framework. Users of a Forge-generated DSL can prototype their application using the library version, and then switch to the Delite version to run on multicore CPUs, GPUs, and clusters without changing the application code. Forge-generated Delite DSLs perform within 2x of hand-optimized C++ and up to 40x better than Spark, an alternative high-level distributed programming environment. Compared to a manually implemented Delite DSL, Forge provides a factor of 3-6x reduction in lines of code and does not sacrifice any performance. Furthermore, Forge specifications can be generated from existing Scala libraries, are easy to maintain, shield DSL developers from changes in the Delite framework, and enable DSLs to be retargeted to other frameworks transparently.

[1]  Kunle Olukotun,et al.  Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.

[2]  Heiko Behrens,et al.  Xtext: implement your language faster than the quick and dirty way , 2010, SPLASH/OOPSLA Companion.

[3]  Kunle Olukotun,et al.  Optimizing data structures in high-level programs: new directions for extensible compilers based on staging , 2013, POPL.

[4]  Walid Taha,et al.  MetaML and multi-stage programming with explicit annotations , 2000, Theor. Comput. Sci..

[5]  Kurt Keutzer,et al.  Copperhead: compiling an embedded data parallel language , 2011, PPoPP '11.

[6]  Kunle Olukotun,et al.  Language virtualization for heterogeneous parallel computing , 2010, OOPSLA.

[7]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[8]  Christian Hofer,et al.  Polymorphic embedding of dsls , 2008, GPCE '08.

[9]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[10]  Kunle Olukotun,et al.  A Heterogeneous Parallel Framework for Domain-Specific Languages , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[11]  Kunle Olukotun,et al.  A domain-specific approach to heterogeneous parallelism , 2011, PPoPP '11.

[12]  Mary Sheeran,et al.  The Design and Implementation of Feldspar - An Embedded Language for Digital Signal Processing , 2010, IFL.

[13]  Eelco Visser,et al.  Stratego/XT 0.17. A language and toolset for program transformation , 2008, Sci. Comput. Program..

[14]  Kunle Olukotun,et al.  OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning , 2011, ICML.

[15]  Paul Hudak,et al.  Building domain-specific embedded languages , 1996, CSUR.

[16]  Walid Taha,et al.  Implementing Multi-stage Languages Using ASTs, Gensym, and Reflection , 2003, GPCE.

[17]  Eelco Visser,et al.  SugarJ: library-based language extensibility , 2011, OOPSLA Companion.

[18]  Jean-Marc Jézéquel,et al.  On Executable Meta-Languages applied to Model Transformations , 2005 .

[19]  Eric Darve,et al.  Liszt: A domain specific language for building portable mesh-based PDE solvers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[20]  Nathaniel Nystrom,et al.  Firepile: run-time compilation for GPUs in scala , 2011, GPCE '11.

[21]  Brian Beckman,et al.  LINQ: reconciling object, relations and XML in the .NET framework , 2006, SIGMOD Conference.

[22]  Jacques Carette,et al.  Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages , 2007, Journal of Functional Programming.

[23]  Herb Sutter,et al.  The Free Lunch Is Over A Fundamental Turn Toward Concurrency in Software , 2013 .

[24]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[25]  Craig Chambers,et al.  FlumeJava: easy, efficient data-parallel pipelines , 2010, PLDI '10.

[26]  Jeffrey Heer,et al.  Wrangler: interactive visual specification of data transformation scripts , 2011, CHI.

[27]  Eelco Visser,et al.  The spoofax language workbench: rules for declarative specification of languages and IDEs , 2010, OOPSLA.

[28]  Sam Tobin-Hochstadt,et al.  Languages as libraries , 2011, PLDI '11.

[29]  Martin Odersky,et al.  Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.

[30]  Kunle Olukotun,et al.  Composition and Reuse with Compiled Domain-Specific Languages , 2013, ECOOP.