Exploiting vectorization in high level synthesis of nested irregular loops

Abstract Synthesis of DoAll loops is a key aspect of High Level Synthesis since they allow to easily exploit the potential parallelism provided by programmable devices. This type of parallelism can be implemented in several ways: by duplicating the implementation of body loop, by exploiting loop pipelining or by applying vectorization. In this paper a methodology for the synthesis of nested irregular DoAll loops based on outer vectorization is proposed. The methodology transforms the intermediate representation of the DoAll loop to introduce vectorization and it can be easily integrated in existing state of the art High Level Synthesis flows since does not require any modification in the rest of the flow. Vectorization is not limited to perfectly nested countable loops: conditional constructs and loops with variable number of iterations are supported. Experimental results on parallel benchmarks show that the generated parallel accelerators have significant speed-up with limited penalties in terms of resource usage and frequency decrement.

[1]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[2]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[3]  Jason Cong,et al.  Efficient compilation of CUDA kernels for high-performance computing on FPGAs , 2013, TECS.

[4]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[5]  Jason Helge Anderson,et al.  From software threads to parallel hardware in high-level synthesis for FPGAs , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[6]  Benoît Dupont de Dinechin,et al.  A Non-iterative Data-Flow Algorithm for Computing Liveness Sets in Strict SSA Programs , 2011, APLAS.

[7]  Zhiru Zhang,et al.  ElasticFlow: A complexity-effective approach for pipelining irregular loop nests , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[8]  Tom Feist,et al.  Vivado Design Suite , 2012 .

[9]  Vivek Sarkar,et al.  Compact representations for control dependence , 1990, PLDI '90.

[10]  Guang R. Gao,et al.  Identifying loops using DJ graphs , 1996, TOPL.

[11]  Sebastian Hack,et al.  Whole-function vectorization , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[12]  Preeti Ranjan Panda,et al.  The Impact of Loop Unrolling on Controller Delay in High Level Synthesis , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[13]  Krste Asanovic,et al.  Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[14]  Alessandro Cilardo,et al.  Design space exploration for high-level synthesis of multi-threaded applications , 2013, J. Syst. Archit..

[15]  Jason Cong,et al.  Improving high level synthesis optimization opportunity through polyhedral transformations , 2013, FPGA '13.

[16]  Ayal Zaks,et al.  Outer-loop vectorization - revisited for short SIMD architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[17]  Jason Cong,et al.  Pattern-based behavior synthesis for FPGA resource reduction , 2008, FPGA '08.

[18]  Jason Helge Anderson,et al.  Impact of FPGA architecture on resource sharing in high-level synthesis , 2012, FPGA '12.

[19]  Fabrizio Ferrandi,et al.  Exploiting Outer Loops Vectorization in High Level Synthesis , 2015, ARCS.

[20]  R. Govindarajan,et al.  Taming Control Divergence in GPUs through Control Flow Linearization , 2014, CC.

[21]  Marco Minutoli,et al.  High level synthesis of RDF queries for graph analytics , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[22]  Fabrizio Ferrandi,et al.  Bambu: A modular framework for the high level synthesis of memory-intensive applications , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[23]  Paul Feautrier,et al.  Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.

[24]  Jason Cong,et al.  High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[25]  Corrado Böhm,et al.  Flow diagrams, turing machines and languages with only two formation rules , 1966, CACM.

[26]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[27]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[28]  Mani B. Srivastava,et al.  High-level synthesis with SIMD units , 2002, Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design.