Composition and Reuse with Compiled Domain-Specific Languages

Programmers who need high performance currently rely on low-level, architecture-specific programming models (e.g. OpenMP for CMPs, CUDA for GPUs, MPI for clusters). Performance optimization with these frameworks usually requires expertise in the specific programming model and a deep understanding of the target architecture. Domain-specific languages (DSLs) are a promising alternative, allowing compilers to map problem-specific abstractions directly to low-level architecture-specific programming models. However, developing DSLs is difficult, and using multiple DSLs together in a single application is even harder because existing compiled solutions do not compose together. In this paper, we present four new performance-oriented DSLs developed with Delite, an extensible DSL compilation framework. We demonstrate new techniques to compose compiled DSLs embedded in a common backend together in a single program and show that generic optimizations can be applied across the different DSL sections. Our new DSLs are implemented with a small number of reusable components (less than 9 parallel operators total) and still achieve performance up to 125x better than library implementations and at worst within 30% of optimized stand-alone DSLs. The DSLs retain good performance when composed together, and applying cross-DSL optimizations results in up to an additional 1.82x improvement.

[1]  Brian Beckman,et al.  LINQ: reconciling object, relations and XML in the .NET framework , 2006, SIGMOD Conference.

[2]  Kunle Olukotun,et al.  OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning , 2011, ICML.

[3]  Geoffrey Mainland,et al.  Nikola: embedding compiled GPU functions in Haskell , 2010 .

[4]  Ken Kennedy,et al.  Telescoping Languages: A System for Automatic Generation of Domain Languages , 2005, Proceedings of the IEEE.

[5]  Kunle Olukotun,et al.  Implementing Domain-Specific Languages for Heterogeneous Parallel Computing , 2011, IEEE Micro.

[6]  Kunle Olukotun,et al.  Building-Blocks for Performance Oriented DSLs , 2011, DSL.

[7]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[8]  Daan Leijen,et al.  Domain specific embedded compilers , 1999, DSL '99.

[9]  Eric Darve,et al.  Liszt: A domain specific language for building portable mesh-based PDE solvers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[10]  Nathaniel Nystrom,et al.  Firepile: run-time compilation for GPUs in scala , 2011, GPCE '11.

[11]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[12]  Martin Odersky,et al.  Scala-virtualized , 2012, PEPM '12.

[13]  Gordon L. Kindlmann,et al.  Diderot: a parallel DSL for image analysis and visualization , 2012, PLDI.

[14]  Kunle Olukotun,et al.  Optimizing data structures in high-level programs: new directions for extensible compilers based on staging , 2013, POPL.

[15]  Walid Taha,et al.  MetaML and multi-stage programming with explicit annotations , 2000, Theor. Comput. Sci..

[16]  Mary Sheeran,et al.  The Design and Implementation of Feldspar - An Embedded Language for Digital Signal Processing , 2010, IFL.

[17]  Walid Taha,et al.  Semantics, Applications, and Implementation of Program Generation , 2001, Lecture Notes in Computer Science.

[18]  Emmanuel Jeannot,et al.  Euro-Par 2011 Parallel Processing , 2011, Lecture Notes in Computer Science.

[19]  Eelco Visser,et al.  The spoofax language workbench: rules for declarative specification of languages and IDEs , 2010, OOPSLA.

[20]  Kevin Lano,et al.  Slicing of UML models using model transformations , 2010, MODELS'10.

[21]  Jacky Estublier,et al.  Composing domain-specific languages for wide-scope software engineering applications , 2005, MoDELS'05.

[22]  Martin Odersky,et al.  Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.

[23]  Kunle Olukotun,et al.  A Heterogeneous Parallel Framework for Domain-Specific Languages , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[24]  Kunle Olukotun,et al.  Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.

[25]  Kurt Keutzer,et al.  Copperhead: compiling an embedded data parallel language , 2011, PPoPP '11.

[26]  Martin Odersky,et al.  A Generic Parallel Collection Framework , 2011, Euro-Par.

[27]  Michael Eichberg,et al.  An architecture for composing embedded domain-specific languages , 2010, AOSD.

[28]  Oege de Moor,et al.  Compiling embedded languages , 2003, J. Funct. Program..

[29]  Franz Franchetti,et al.  Formal loop merging for signal transforms , 2005, PLDI '05.

[30]  Kunle Olukotun,et al.  A domain-specific approach to heterogeneous parallelism , 2011, PPoPP '11.

[31]  Walid Taha Semantics, Applications, and Implementation of Program Generation , 2003, J. Funct. Program..

[32]  Michael Isard,et al.  Distributed data-parallel computing using a high-level programming language , 2009, SIGMOD Conference.

[33]  Michael Eichberg,et al.  Reify your collection queries for modularity and speed! , 2012, AOSD.

[34]  Doug Lea,et al.  A Java fork/join framework , 2000, JAVA '00.

[35]  Zhengping Qian,et al.  MadLINQ: large-scale distributed matrix computation for the cloud , 2012, EuroSys '12.

[36]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.