Modulo scheduling of symbolically tiled loops for tightly coupled processor arrays

On processor arrays, combining modulo scheduling with tiling would increase the degree of parallelism compared to both in isolation. However, tiling must be symbolic to yield input-size independent code, making the tile size unknown at compile time and introducing parameters into the dependence constraints. Existing solutions to symbolic tiling have, however, so far ignored modulo scheduling. In this paper, we present a compiler algorithm that integrates modulo scheduling with symbolic tiling: the dependence constraints are partitioned into a parametric- and non-parametric subset and, using only the non-parametric constraints, we find a solution to the modulo scheduling problem. To still satisfy the parametric dependence constraints, we calculate a minimum tile size from the found solution. If the minimum tile size is not satisfied at runtime, a fallback schedule is instead chosen. We formally and experimentally show that, if the number of processor elements to map to is known at compile time, the resulting schedules are latency-optimal; otherwise, they are negligibly nonoptimal.

[1]  Jürgen Teich,et al.  Scheduling of partitioned regular algorithms on processor arrays with constrained resources , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.

[2]  Sanjay V. Rajopadhye,et al.  Efficient Tiled Loop Generation: D-Tiling , 2009, LCPC.

[3]  Josep Llosa,et al.  A comparative study of modulo scheduling techniques , 2002, ICS '02.

[4]  J. Ramanujam,et al.  DynTile: Parametric tiled loop generation for parallel execution on multicore processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[5]  Leibo Liu,et al.  Polyhedral model based mapping optimization of loop nests for CGRAs , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[6]  Jürgen Teich,et al.  A Dynamically Reconfigurable Weakly Programmable Processor Array Architecture Template , 2006, ReCoSoC.

[7]  Frédéric Vivien,et al.  Constructing and exploiting linear schedules with prescribed parallelism , 2002, TODE.

[8]  Hongbo Rong,et al.  Single-dimension software pipelining for multi-dimensional loops , 2004 .

[9]  Frank Hannig,et al.  Invasive Tightly-Coupled Processor Arrays , 2014, ACM Trans. Embed. Comput. Syst..

[10]  Jingling Xue,et al.  Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.

[11]  Alexandre E. Eichenberger,et al.  Efficient formulation for optimal modulo schedulers , 1997, PLDI '97.

[12]  Lothar Thiele,et al.  On the design of piecewise regular processor arrays , 1989, IEEE International Symposium on Circuits and Systems,.

[13]  Jürgen Teich,et al.  Symbolic Mapping of Loop Programs onto Processor Arrays , 2014, J. Signal Process. Syst..

[14]  Vinod Kathail,et al.  Algorithmic Synthesis Using PICO , 2008 .

[15]  Jürgen Teich,et al.  Symbolic parallelization of loop programs for massively parallel processor arrays , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[16]  Vinod Kathail,et al.  An Integrated Framework for Application Engine Synthesis and Verification from High Level C Algorithms , 2008 .

[17]  Yves Robert,et al.  Affine-by-Statement Scheduling of Uniform and Affine Loop Nests over Parametric , 1995, J. Parallel Distributed Comput..

[18]  Paul Feautrier,et al.  Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.

[19]  Bjorn De Sutter,et al.  Coarse-Grained Reconfigurable Array Architectures , 2018, Handbook of Signal Processing Systems.

[20]  Sanjay V. Rajopadhye,et al.  Parameterized loop tiling , 2012, TOPL.

[21]  Paul Feautrier,et al.  Polyhedron Model , 2011, Encyclopedia of Parallel Computing.