MSL: A Synthesis Enabled Language for Distributed Implementations

This paper demonstrates how ideas from generative programming and software synthesis can help support the development of bulk-synchronous distributed memory kernels. These ideas are realized in a new language called MSL, a C-like language that combines synthesis features with high level notations for array manipulation and bulk-synchronous parallelism to simplify the semantic analysis required for synthesis. The paper shows that by leveraging these high level notations, it is possible to scale the synthesis and automated bug-finding technologies that underlie MSL to realistic computational kernels. Specifically, we demonstrate the methodology through case studies implementing non-trivial distributed kernels -- both regular and irregular -- from the NAS parallel benchmarks. We show that our approach can automatically infer many challenging details from these benchmarks and can enable high level implementation ideas to be reused between similar kernels. We also demonstrate that these high level notations map easily to low level C code and show that the performance of this generated code matches that of handwritten Fortran.

[1]  Jeremy G. Siek,et al.  The Matrix Template Library: A Generic Programming Approach to High Performance Numerical Linear Algebra , 1998, ISCOPE.

[2]  Stanley B. Lippman C++ gems , 1996 .

[3]  Harvey Richardson,et al.  High Performance Fortran: history, overview and current developments , 1996 .

[4]  Jan Vitek,et al.  Terra: a multi-stage language for high-performance computing , 2013, PLDI.

[5]  Samuel Williams,et al.  Auto-Tuning the 27-point Stencil for Multicore , 2009 .

[6]  Feng Liu,et al.  Dynamic synthesis for relaxed memory models , 2012, PLDI.

[7]  Eran Yahav,et al.  Deriving linearizable fine-grained concurrent objects , 2008, PLDI '08.

[8]  Eran Yahav,et al.  Automatic inference of memory fences , 2010, Formal Methods in Computer Aided Design.

[9]  Samuel Williams,et al.  An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[10]  Sumit Gulwani,et al.  Oracle-guided component-based program synthesis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[11]  Katherine A. Yelick,et al.  Hierarchical Computation in the SPMD Programming Model , 2013, LCPC.

[12]  Kunle Olukotun,et al.  A Heterogeneous Parallel Framework for Domain-Specific Languages , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[13]  Steven J. Deitz,et al.  The design and implementation of a parallel array operator for the arbitrary remapping of data , 2003, PPoPP '03.

[14]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[15]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[16]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[17]  Martin Griebl,et al.  Code generation in the polytope model , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[18]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[19]  Armando Solar-Lezama,et al.  Program synthesis by sketching , 2008 .

[20]  Tamara G. Kolda,et al.  An overview of the Trilinos project , 2005, TOMS.

[21]  Bradford L. Chamberlain The design and implementation of a region-based parallel language , 2001 .

[22]  Eric Darve,et al.  Liszt: A domain specific language for building portable mesh-based PDE solvers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[23]  Emina Torlak,et al.  Growing solver-aided languages with rosette , 2013, Onward!.

[24]  F SiegelStephen,et al.  Automatic formal verification of MPI-based parallel programs , 2011 .

[25]  Adam Betts,et al.  Engineering a Static Verification Tool for GPU Kernels , 2014, CAV.

[26]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[27]  Thomas Fahringer,et al.  A multi-objective auto-tuning framework for parallel codes , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[28]  Richard Veras,et al.  When polyhedral transformations meet SIMD code generation , 2013, PLDI.

[29]  Richard J. Lipton,et al.  Reduction: a method of proving properties of parallel programs , 1975, CACM.

[30]  José M. F. Moura,et al.  Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Alogorithms , 2004, Int. J. High Perform. Comput. Appl..

[31]  Vivek Sarkar,et al.  Phasers: a unified deadlock-free construct for collective and point-to-point synchronization , 2008, ICS '08.

[32]  Armando Solar-Lezama,et al.  Sketching concurrent data structures , 2008, PLDI '08.

[33]  Guodong Li,et al.  Scalable SMT-based verification of GPU kernel functions , 2010, FSE '10.

[34]  Sumit Gulwani,et al.  Path-based inductive synthesis for program inversion , 2011, PLDI '11.

[35]  Katherine Yelick,et al.  Titanium Language Reference Manual , 2001 .

[36]  Alexander Aiken,et al.  Concurrent data representation synthesis , 2012, PLDI.

[37]  Stephen N. Freund,et al.  Atomizer: a dynamic atomicity checker for multithreaded programs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[38]  Uday Bondhugula,et al.  PLuTo: A Practical and Fully Automatic Polyhedral Program Optimization System , 2015 .

[39]  Eric Niebler,et al.  Proto: a compiler construction toolkit for DSELs , 2007, LCSD '07.

[40]  Krzysztof Czarnecki,et al.  Generative programming - methods, tools and applications , 2000 .

[41]  Martin Schulz,et al.  Formal analysis of MPI-based parallel programs , 2011, Commun. ACM.

[42]  Sumit Gulwani,et al.  Automated feedback generation for introductory programming assignments , 2012, PLDI.

[43]  Shoaib Ashraf Kamil,et al.  Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages , 2012 .

[44]  Daniel Kroening,et al.  A Tool for Checking ANSI-C Programs , 2004, TACAS.

[45]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[46]  Bradford L. Chamberlain,et al.  The design and implementation of a region-based parallel programming language , 2001 .

[47]  Sanjit A. Seshia,et al.  Combinatorial sketching for finite programs , 2006, ASPLOS XII.

[48]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[49]  Emina Torlak,et al.  A lightweight symbolic virtual machine for solver-aided host languages , 2014, PLDI.

[50]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[51]  Helmar Burkhart,et al.  PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[52]  Eugene Burmako,et al.  Scala macros: let our powers combine!: on how rich syntax and static types work with metaprogramming , 2013, SCALA@ECOOP.

[53]  Stephen F. Siegel,et al.  Automatic formal verification of MPI-based parallel programs , 2011, PPoPP '11.