Code Graph Transformations for Verifiable Generation of SIMD-Parallel Assembly Code

The Coconut code generator produces highly efficient assembly code, targeting signal processing applications such as Magnetic Resonance Imaging. It takes advantage of SIMD-parallelism, and captures as patterns assembly language "tricks" that produce very efficient, but highly convoluted code -- the motivation is to beat the expert assembly tuner, while producing reliable output and maintainable input. On a growing set of benchmarks, it produces code with peak or near-peak efficiency. To facilitate formal verification of the resulting code, the intermediate languages used in compilation are all variations on term hypergraphs (jungles) that we call "code graphs". To verify the results of compilation, schedulable code graphs containing hyperedges labelled by instructions operating on vectors of components are transformed by replacing SIMD instructions with non-vector instructions, applying simplification rules, and comparing the result to specifications.

[1]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[2]  Gunther Schmidt,et al.  Relational Methods in Computer Science , 1999, Inf. Sci..

[3]  Fabio Gadducci,et al.  An Algebraic Presentation of Term Graphs, via GS-Monoidal Categories , 1999, Appl. Categorical Struct..

[4]  Cristian S. Calude,et al.  Discrete Mathematics and Theoretical Computer Science , 2003, Lecture Notes in Computer Science.

[5]  Hartmut Ehrig,et al.  Handbook of graph grammars and computing by graph transformation: vol. 3: concurrency, parallelism, and distribution , 1999 .

[6]  M. J. Plasmeijer,et al.  Term graph rewriting: theory and practice , 1993 .

[7]  Wolfram Kahl,et al.  A Relation-Algebraic Approach to Graph Structure Transformation , 2001, RelMiCS.

[8]  Jacques Carette,et al.  Control-Flow Semantics for Assembly-Level Data-Flow Graphs , 2005, RelMiCS.

[9]  Christopher Kumar,et al.  A Domain-Specific Language for the Generation of Optimized SIMD-Parallel Assembly Code , 2007 .

[10]  George C. Necula,et al.  Translation validation for an optimizing compiler , 2000, PLDI '00.

[11]  Amir Pnueli,et al.  Validating software pipelining optimizations , 2002, CASES '02.

[12]  Rajeev Alur,et al.  A Temporal Logic of Nested Calls and Returns , 2004, TACAS.

[13]  Rubino Geiß,et al.  Graph Rewriting for Hardware Dependent Program Optimizations , 2007, AGTIVE.

[14]  Berthold Hoffmann,et al.  Jungle Evaluation for Efficient Term Rewriting , 1988, ALP.

[15]  Detlef Plump,et al.  Term graph rewriting , 1999 .

[16]  Xavier Leroy,et al.  Formal verification of translation validators: a case study on instruction scheduling optimizations , 2008, POPL '08.

[17]  Amir Pnueli,et al.  Translation Validation , 1998, TACAS.

[18]  Christopher Kumar Anand,et al.  MultiLoop: efficient software pipelining for modern hardware , 2007, CASCON.

[19]  Wolfgang Thaller Explicitly Staged Software Pipelining , 2006 .