Efficient nested loop pipelining in high level synthesis using polyhedral bubble insertion

Loop pipelining is a key transformation in high-level synthesis tools as it helps maximizing both computational throughput and hardware utilization. Nevertheless, it somewhat looses its efficiency when dealing with small trip-count inner loops, as the pipeline latency overhead quickly limits its efficiency. Even if it is possible to overcome this limitation by pipelining the execution of a whole loop nest, the applicability of nested loop pipelining has so far been limited to a very narrow subset of loops, namely perfectly nested loops with constant bounds. In this work we propose to extend the applicability of nested-loop pipelining to imperfectly nested loops with affine dependencies by leveraging on the so-called polyhedral model. We show how such loop nest can be analyzed, and under certain conditions, how one can modify the source code in order to allow nested loop pipeline to be applied using a method called polyhedral bubble insertion. We also discuss the implementation of our method in a source-to-source compiler specifically targeted at High-Level Synthesis tools.

[1]  Albert Cohen,et al.  Iterative optimization in the polyhedral model: part ii, multidimensional time , 2008, PLDI '08.

[2]  Paul Feautrier,et al.  Adjusting a Program Transformation for Legality , 2005, Parallel Process. Lett..

[3]  Rainer Leupers,et al.  Handbook of Signal Processing Systems , 2010 .

[4]  David Wonnacott A Retrospective of the Omega Project , 2010 .

[5]  FeautrierPaul Some efficient solutions to the affine scheduling problem , 1992 .

[6]  Dan Quinlan,et al.  The ROSE Source-to-Source Compiler Infrastructure , 2011 .

[7]  J. Ramanujam Software Pipelining of Nested Loops , 1994 .

[8]  Jürgen Teich,et al.  Partitioning Processor Arrays under Resource Constraints , 1997, J. VLSI Signal Process..

[9]  Guang R. Gao,et al.  Single-dimension software pipelining for multi-dimensional loops , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[10]  Albert Cohen,et al.  The Polyhedral Model Is More Widely Applicable Than You Think , 2010, CC.

[11]  Albert Cohen,et al.  Software Pipelining in Nested Loops with Prolog-Epilog Merging , 2008, HiPEAC.

[12]  Margarida F. Jacome,et al.  CALiBeR: a software pipelining algorithm for clustered embedded VLIW processors , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[13]  Pierre Boulet,et al.  Scanning polyhedra without Do-loops , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[14]  Denis Barthou,et al.  FADAlib: an open source C++ library for fuzzy array dataflow analysis , 2010, ICCS.

[15]  Bogdan Pasca,et al.  Automatic Generation of FPGA-Specific Pipelined Accelerators , 2011, ARC.

[16]  Susmita Sur-Kolay,et al.  Combined instruction and loop parallelism in array synthesis for FPGAs , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).

[17]  Vincent Loechner,et al.  Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions , 2007, Algorithmica.

[18]  Henry G. Dietz,et al.  Loop Coalescing and Scheduling for Barrier MIMD Architectures , 1993, IEEE Trans. Parallel Distributed Syst..

[19]  Patrice Quinton,et al.  Hardware synthesis for multi-dimensional time , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[20]  Albert Cohen,et al.  Automatic Correction of Loop Transformations , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[21]  Ed F. Deprettere,et al.  Increasing Pipelined IP Core Utilization in Process Networks Using Exploration , 2004, FPL.

[22]  Sanjay V. Rajopadhye,et al.  Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[23]  Doran Wilde,et al.  A LIBRARY FOR DOING POLYHEDRAL OPERATIONS , 2000 .

[24]  Paul Feautrier,et al.  Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.

[25]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[26]  P. Feautrier Parametric integer programming , 1988 .

[27]  Paul Feautrier,et al.  Fuzzy array dataflow analysis , 1995, PPOPP '95.

[28]  Sven Verdoolaege,et al.  isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.

[29]  BruynoogheMaurice,et al.  Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions , 2007 .

[30]  W. Kelly,et al.  Code generation for multiple mappings , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[31]  Jihong Kim,et al.  Time Optimal Software Pipelining of Loops with Control Flows , 2004, International Journal of Parallel Programming.

[32]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[33]  Alexandru Turjan,et al.  Classifying interprocess communication in process network representation of nested-loop programs , 2007, TECS.

[34]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.