JIT costing adaptive skeletons for performance portability

The proliferation of widely available, but very different, parallel architectures makes the ability to deliver good parallel performance on a range of architectures, or performance portability, highly desirable. Irregular parallel problems, where the number and size of tasks is unpredictable, are particularly challenging and require dynamic coordination. The paper outlines a novel approach to delivering portable parallel performance for irregular parallel programs. The approach combines JIT compiler technology with dynamic scheduling and dynamic transformation of declarative parallelism. We specify families of algorithmic skeletons plus equations for rewriting skeleton expressions. We present the design of a framework that unfolds skeletons into task graphs, dynamically schedules tasks, and dynamically rewrites skeletons, guided by a lightweight JIT trace-based cost model, to adapt the number and granularity of tasks for the architecture. We outline the system architecture and prototype implementation in Racket/Pycket. As the current prototype does not yet automatically perform dynamic rewriting we present results based on manual offline rewriting, demonstrating that (i) the system scales to hundreds of cores given enough parallelism of suitable granularity, and (ii) the JIT trace cost model predicts granularity accurately enough to guide rewriting towards a good adaptive transformation.

[1]  Susumu Horiguchi,et al.  A parallel SML compiler based on algorithmic skeletons , 2005, Journal of Functional Programming.

[2]  Simon L. Peyton Jones,et al.  Compiling Haskell by Program Transformation: A Report from the Trenches , 1996, ESOP.

[3]  Richard S. Bird,et al.  Algebraic Identities for Program Calculation , 1989, Comput. J..

[4]  Murray Cole,et al.  Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[5]  Mason Chang,et al.  Trace-based just-in-time type specialization for dynamic languages , 2009, PLDI '09.

[6]  Philip W. Trinder,et al.  JIT-Based Cost Analysis for Dynamic Program Transformations , 2016, Electron. Notes Theor. Comput. Sci..

[7]  Sam Lindley,et al.  Generating Performance Portable Code using Rewrite Rules , 2015 .

[8]  Patrick Maier,et al.  Towards an Adaptive Skeleton Framework for Performance Portability , 2015 .

[9]  Peter Kilpatrick,et al.  Cost-Directed Refactoring for Parallel Erlang Programs , 2013, International Journal of Parallel Programming.

[10]  Michael Franz,et al.  Dynamic Parallelization and Vectorization of Binary Executables on Hierarchical Platforms , 2008, J. Instr. Level Parallelism.

[11]  Philip W. Trinder,et al.  The HdpH DSLs for scalable reliable computation , 2014, Haskell '14.

[12]  Simon L. Peyton Jones,et al.  Towards Haskell in the cloud , 2012, Haskell '11.

[13]  Sam Lindley,et al.  Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code , 2015, ICFP.

[14]  Sam Tobin-Hochstadt,et al.  Pycket: a tracing JIT for a functional language , 2015, ICFP.

[15]  Carl Friedrich Bolz,et al.  Tracing the meta-level: PyPy's tracing JIT compiler , 2009, ICOOOLPS@ECOOP.