论文信息 - MSL: A Synthesis Enabled Language for Distributed Implementations

MSL: A Synthesis Enabled Language for Distributed Implementations

This paper demonstrates how ideas from generative programming and software synthesis can help support the development of bulk-synchronous distributed memory kernels. These ideas are realized in a new language called MSL, a C-like language that combines synthesis features with high level notations for array manipulation and bulk-synchronous parallelism to simplify the semantic analysis required for synthesis. The paper shows that by leveraging these high level notations, it is possible to scale the synthesis and automated bug-finding technologies that underlie MSL to realistic computational kernels. Specifically, we demonstrate the methodology through case studies implementing non-trivial distributed kernels -- both regular and irregular -- from the NAS parallel benchmarks. We show that our approach can automatically infer many challenging details from these benchmarks and can enable high level implementation ideas to be reused between similar kernels. We also demonstrate that these high level notations map easily to low level C code and show that the performance of this generated code matches that of handwritten Fortran.

[1] Jeremy G. Siek,et al. The Matrix Template Library: A Generic Programming Approach to High Performance Numerical Linear Algebra , 1998, ISCOPE.

[2] Stanley B. Lippman. C++ gems , 1996 .

[3] Harvey Richardson,et al. High Performance Fortran: history, overview and current developments , 1996 .

[4] Jan Vitek,et al. Terra: a multi-stage language for high-performance computing , 2013, PLDI.

[5] Samuel Williams,et al. Auto-Tuning the 27-point Stencil for Multicore , 2009 .

[6] Feng Liu,et al. Dynamic synthesis for relaxed memory models , 2012, PLDI.

[7] Eran Yahav,et al. Deriving linearizable fine-grained concurrent objects , 2008, PLDI '08.

[8] Eran Yahav,et al. Automatic inference of memory fences , 2010, Formal Methods in Computer Aided Design.

[9] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[10] Sumit Gulwani,et al. Oracle-guided component-based program synthesis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[11] Katherine A. Yelick,et al. Hierarchical Computation in the SPMD Programming Model , 2013, LCPC.

[12] Kunle Olukotun,et al. A Heterogeneous Parallel Framework for Domain-Specific Languages , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[13] Steven J. Deitz,et al. The design and implementation of a parallel array operator for the arbitrary remapping of data , 2003, PPoPP '03.

[14] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[15] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[16] Yuanyuan Zhou,et al. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[17] Martin Griebl,et al. Code generation in the polytope model , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[18] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[19] Armando Solar-Lezama,et al. Program synthesis by sketching , 2008 .

[20] Tamara G. Kolda,et al. An overview of the Trilinos project , 2005, TOMS.

[21] Bradford L. Chamberlain. The design and implementation of a region-based parallel language , 2001 .

[22] Eric Darve,et al. Liszt: A domain specific language for building portable mesh-based PDE solvers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[23] Emina Torlak,et al. Growing solver-aided languages with rosette , 2013, Onward!.

[24] F SiegelStephen,et al. Automatic formal verification of MPI-based parallel programs , 2011 .

[25] Adam Betts,et al. Engineering a Static Verification Tool for GPU Kernels , 2014, CAV.

[26] Bradford L. Chamberlain,et al. Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[27] Thomas Fahringer,et al. A multi-objective auto-tuning framework for parallel codes , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[28] Richard Veras,et al. When polyhedral transformations meet SIMD code generation , 2013, PLDI.

[29] Richard J. Lipton,et al. Reduction: a method of proving properties of parallel programs , 1975, CACM.

[30] José M. F. Moura,et al. Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Alogorithms , 2004, Int. J. High Perform. Comput. Appl..

[31] Vivek Sarkar,et al. Phasers: a unified deadlock-free construct for collective and point-to-point synchronization , 2008, ICS '08.

[32] Armando Solar-Lezama,et al. Sketching concurrent data structures , 2008, PLDI '08.

[33] Guodong Li,et al. Scalable SMT-based verification of GPU kernel functions , 2010, FSE '10.

[34] Sumit Gulwani,et al. Path-based inductive synthesis for program inversion , 2011, PLDI '11.

[35] Katherine Yelick,et al. Titanium Language Reference Manual , 2001 .

[36] Alexander Aiken,et al. Concurrent data representation synthesis , 2012, PLDI.

[37] Stephen N. Freund,et al. Atomizer: a dynamic atomicity checker for multithreaded programs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[38] Uday Bondhugula,et al. PLuTo: A Practical and Fully Automatic Polyhedral Program Optimization System , 2015 .

[39] Eric Niebler,et al. Proto: a compiler construction toolkit for DSELs , 2007, LCSD '07.