Loop distribution for K-loops on Reconfigurable Architectures

Within the context of Reconfigurable Architectures, we define a kernel loop (K-loop) as a loop containing in the loop body one or more kernels mapped on the reconfigurable hardware. In this paper, we analyze how loop distribution can be used in the context of K-loops. We propose an algorithm for splitting K-loops that contain more than one kernel and intra-iteration dependencies. The purpose is to create smaller loops (K-sub-loops) that have more speedup potential when parallelized. Making use of partial reconfigurability, the K-sub-loops can take advantage of having more area available for multiple kernel instances to execute in parallel on the FPGA. In order to study the potential for performance improvement of using the loop distribution on K-loops, we make use of a suite of randomly generated test cases. The results show an improvement of more than 40% over previously proposed methods in more than 60% of the cases. The algorithm is also validated with a K-loop extracted from the MJPEG application. A speedup of maximum 8.22 is achieved when mapping MJPEG on VirtexIIPro with partial reconfiguration and 13.41 when statically mapping it on the Virtex-4.

[1]  Stamatis Vassiliadis,et al.  DWARV: Delftworkbench Automated Reconfigurable VHDL Generator , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[2]  Ken Kennedy,et al.  Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[3]  Stamatis Vassiliadis,et al.  The MOLEN polymorphic processor , 2004, IEEE Transactions on Computers.

[4]  Meikang Qiu,et al.  Maximum loop distribution and fusion for two-level loops considering code size , 2005, 8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05).

[5]  Wayne Luk,et al.  Multiloop Parallelisation Using Unrolling and Fission , 2010, Int. J. Reconfigurable Comput..

[6]  João M. P. Cardoso,et al.  Loop dissevering: a technique for temporally partitioning loops in dynamically reconfigurable computing platforms , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[7]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[8]  Alessandro Forin,et al.  Energy reduction with run-time partial reconfiguration (abstract only) , 2010, FPGA '10.

[9]  Ken Kennedy,et al.  Loop distribution with arbitrary control flow , 1990, Proceedings SUPERCOMPUTING '90.

[10]  Todor Stefanov,et al.  Optimal Loop Unrolling and Shifting for Reconfigurable Architectures , 2009, TRETS.