Polyhedral Optimizations for a Data-Flow Graph Language

This paper proposes a novel optimization framework for the Data-Flow Graph Language DFGL, a dependence-based notation for macro-dataflow model which can be used as an embedded domain-specific language. Our optimization framework follows a "dependence-first" approach in capturing the semantics of DFGL programs in polyhedral representations, as opposed to the standard polyhedral approach of deriving dependences from access functions and schedules. As a first step, our proposed framework performs two important legality checks on an input DFGL program -- checking for potential violations of the single-assignment rule, and checking for potential deadlocks. After these legality checks are performed, the DFGL dependence information is used in lieu of standard polyhedral dependences to enable polyhedral transformations and code generation, which include automatic loop transformations, tiling, and code generation of parallel loops with coarse-grain fork-join and fine-grain doacross synchronizations. Our performance experiments with nine benchmarks on Intel Xeon and IBM Power7 multicore processors show that the DFGL versions optimized by our proposed framework can deliver upi?źto 6.9$$\times $$ performance improvement relative to standard OpenMP versions of these benchmarks. To the best of our knowledge, this is the first system to encode explicit macro-dataflow parallelism in polyhedral representations so as to provide programmers with an easy-to-use DSL notation with legality checks, while taking full advantage of the optimization functionality in state-of-the-art polyhedral frameworks.

[1]  R. Govindarajan,et al.  Compiler/Runtime Framework for Dynamic Dataflow Parallelization of Tiled Programs , 2015, ACM Trans. Archit. Code Optim..

[2]  Paul Feautrier,et al.  Polyhedron Model , 2011, Encyclopedia of Parallel Computing.

[3]  William Pugh,et al.  Constraint-based array dependence analysis , 1998, TOPL.

[4]  Vivek Sarkar,et al.  DFGR an Intermediate Graph Representation for Macro-Dataflow Programs , 2014, 2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing.

[5]  Jack B. Dennis,et al.  VAL -- A Value-Oriented Algorithmic Language (Preliminary Reference Manual), , 1979 .

[6]  Paraskevas Evripidou,et al.  TFlux: A Portable Platform for Data-Driven Multithreading on Commodity Multicore Systems , 2008, 2008 37th International Conference on Parallel Processing.

[7]  Richard W. Vuduc,et al.  Performance evaluation of concurrent collections on high-performance multicore computing systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[8]  Martin Griebl,et al.  Array Dataflow Analysis for Explicitly Parallel Programs , 1996, Euro-Par, Vol. I.

[9]  Ian Karlin,et al.  LULESH Programming Model and Performance Ports Overview , 2012 .

[10]  Vivek Sarkar,et al.  Partitioning parallel programs for macro-dataflow , 1986, LFP '86.

[11]  Jens Palsberg,et al.  Concurrent Collections , 2010 .

[12]  Kunle Olukotun,et al.  Simplifying Scalable Graph Processing with a Domain-Specific Language , 2014, CGO '14.

[13]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[14]  Ron Cytron,et al.  Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.

[15]  Vivek Sarkar,et al.  Oil and Water Can Mix: An Integration of Polyhedral and AST-Based Transformations , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[17]  Vivek Sarkar,et al.  Integrating Asynchronous Task Parallelism with MPI , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[18]  Uday Bondhugula,et al.  PolyGLoT: A Polyhedral Loop Transformation Framework for a Graphical Dataflow Language , 2013, CC.

[19]  P. Feautrier Some Eecient Solutions to the Aane Scheduling Problem Part Ii Multidimensional Time , 1992 .

[20]  Vivek Sarkar,et al.  A Practical Approach to DOACROSS Parallelization , 2012, Euro-Par.

[21]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[22]  Vivek Sarkar,et al.  Software challenges in extreme scale systems , 2009 .

[23]  Jason Cong,et al.  Mapping a data-flow programming model onto heterogeneous platforms , 2012, LCTES 2012.

[24]  Vivek Sarkar,et al.  Expressing DOACROSS Loop Dependences in OpenMP , 2013, IWOMP.

[25]  Uday Bondhugula,et al.  Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[26]  Vivek Sarkar,et al.  Polyhedral Optimizations of Explicitly Parallel Programs , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[27]  Charles E. Leiserson,et al.  Executing task graphs using work-stealing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[28]  Tomofumi Yuki,et al.  AlphaZ: A System for Design Space Exploration in the Polyhedral Model , 2012, LCPC.

[29]  John Glauert,et al.  SISAL: streams and iteration in a single assignment language. Language reference manual, Version 1. 2. Revision 1 , 1985 .

[30]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[31]  Tomofumi Yuki,et al.  Array dataflow analysis for polyhedral X10 programs , 2013, PPoPP '13.

[32]  Alexander Aiken,et al.  Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[33]  Jason Cong,et al.  Mapping a data-flow programming model onto heterogeneous platforms , 2012, LCTES '12.

[34]  Nick Vrvilo Implementing Asynchronous Checkpoint/Restart for the Concurrent Collections Model , 2014 .