MLIR: Scaling Compiler Infrastructure for Domain Specific Computation

This work presents MLIR, a novel approach to building reusable and extensible compiler infrastructure. MLIR addresses software fragmentation, compilation for heterogeneous hardware, significantly reducing the cost of building domain specific compilers, and connecting existing compilers together. MLIR facilitates the design and implementation of code generators, translators and optimizers at different levels of abstraction and across application domains, hardware targets and execution environments. The contribution of this work includes (1) discussion of MLIR as a research artifact, built for extension and evolution, while identifying the challenges and opportunities posed by this novel design, semantics, optimization specification, system, and engineering. (2) evaluation of MLIR as a generalized infrastructure that reduces the cost of building compilers-describing diverse use-cases to show research and educational opportunities for future programming languages, compilers, execution environments, and computer architecture. The paper also presents the rationale for MLIR, its original design principles, structures and semantics.

[1]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[2]  G. Ramalingam,et al.  On loops, dominators, and dominance frontiers , 2002, TOPL.

[3]  Karine Heydemann,et al.  Secure delivery of program properties through optimizing compilation , 2020, CC.

[4]  David A. Padua,et al.  In search of a program generator to implement generic transformations for high-performance computing , 2006, Sci. Comput. Program..

[5]  Uday Bondhugula,et al.  PolyMage: Automatic Optimization for Image Processing Pipelines , 2015, ASPLOS.

[6]  Cédric Bastoul,et al.  Opening polyhedral compiler's black box , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[7]  Arthur H. Veen,et al.  Dataflow machine architecture , 1986, CSUR.

[8]  Ioana Burcea,et al.  A compiler and runtime for heterogeneous computing , 2012, DAC Design Automation Conference 2012.

[9]  David Parello,et al.  Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.

[10]  Maya R. Gupta,et al.  Lattice Regression , 2009, NIPS.

[11]  Andrew W. Appel,et al.  SSA is functional programming , 1998, SIGP.

[12]  Keshav Pingali,et al.  The program structure tree: computing control regions in linear time , 1994, PLDI '94.

[13]  George C. Necula,et al.  Translation validation for an optimizing compiler , 2000, PLDI '00.

[14]  Xavier Leroy,et al.  Formal verification of translation validators: a case study on instruction scheduling optimizations , 2008, POPL '08.

[15]  Hariharan Sandanagobalane,et al.  Diesel: DSL for linear algebra and neural net computations on GPUs , 2018, MAPL@PLDI.

[16]  Michael H. Paleczny,et al.  A simple graph-based intermediate representation , 1995, IR '95.

[17]  Keith D. Cooper,et al.  Combining analyses, combining optimizations , 1995, TOPL.

[18]  Andreas Krall,et al.  Fast and flexible instruction selection with constraints , 2018, CC.

[19]  Sven Verdoolaege,et al.  isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.

[20]  Henk Corporaal,et al.  Declarative Loop Tactics for Domain-specific Optimization , 2019, ACM Trans. Archit. Code Optim..

[21]  Tim Zerrell,et al.  Stripe: Tensor Compilation via the Nested Polyhedral Model , 2019, ArXiv.

[22]  Charles E. Leiserson,et al.  Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation , 2017, PPoPP.

[23]  Amir Pnueli,et al.  Translation Validation , 1998, TACAS.

[24]  José Meseguer,et al.  Twenty years of rewriting logic , 2010, J. Log. Algebraic Methods Program..

[25]  Alex Groce,et al.  Taming compiler fuzzers , 2013, ACM-SIGPLAN Symposium on Programming Language Design and Implementation.

[26]  Bertrand A. Maher,et al.  Glow: Graph Lowering Compiler Techniques for Neural Networks , 2018, ArXiv.

[27]  Thomas W. Reps,et al.  WYSINWYX: What you see is not what you eXecute , 2005, TOPL.

[28]  Eelco Visser,et al.  Stratego/XT 0.17. A language and toolset for program transformation , 2008, Sci. Comput. Program..

[29]  Uday Bondhugula,et al.  Effective automatic computation placement and data allocation for parallelization of regular programs , 2014, ICS '14.

[30]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[31]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[32]  Thomas M. Conte,et al.  Treegion scheduling for wide issue processors , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[33]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[34]  Shoaib Kamil,et al.  Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code , 2018, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[35]  Xavier Leroy,et al.  Verified validation of lazy code motion , 2009, PLDI '09.

[36]  Francky Catthoor,et al.  Polyhedral parallel code generation for CUDA , 2013, TACO.

[37]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[38]  Pierre Jouvelot,et al.  LLVM parallel intermediate representation: design and evaluation using OpenSHMEM communications , 2015, LLVM '15.

[39]  Christian Lengauer,et al.  Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation , 2012, Parallel Process. Lett..

[40]  Russell W. Quong,et al.  ANTLR: A predicated‐LL(k) parser generator , 1995, Softw. Pract. Exp..

[41]  Martin Odersky,et al.  Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.

[42]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[43]  Chun Chen,et al.  A Programming Language Interface to Describe Transformations and Code Generation , 2010, LCPC.

[44]  Xavier Leroy,et al.  Embedded Program Annotations for WCET Analysis , 2018, WCET.

[45]  Jens Palsberg,et al.  From OO to FPGA: fitting round objects into square hardware? , 2010, OOPSLA.