Mapping Optimization of Affine Loop Nests for Reconfigurable Computing Architecture

SUMMARY Reconfigurable computing system is a class of parallel ar- chitecture with the ability of computing in hardware to increase performance, while remaining much of flexibility of a software solution. This ar- chitecture is particularly suitable for running regular and compute-intensive tasks, nevertheless, most compute-intensive tasks spend most of their run- ning time in nested loops. Polyhedron model is a powerful tool to give a reasonable transformation on such nested loops. In this paper, a num- ber of issues are addressed towards the goal of optimization of a ffi ne loop nests for reconfigurable cell array ( RCA ), such as approach to make the most use of processing elements ( PE ) while minimizing the communication volume by loop transformation in polyhedron model, determination of tilling form by the intra-statement dependence analysis and determination of tilling size by the tilling form and the RCA size. Experimental results on a number of kernels demonstrate the e ff ectiveness of the mapping optimization approaches developed. Compared with DFG-based optimization approach, the execution performances of 1-d jacobi and matrix multiplication are improved by 28% and 48 . 47%. Lastly, the run-time complexity is acceptable for the practical cases.

[1]  Wenjie Wang,et al.  A reconfigurable multi-processor SoC for media applications , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[2]  Leibo Liu,et al.  Compiler Framework for Reconfigurable Computing Architecture , 2009, IEICE Trans. Electron..

[3]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[4]  Li Jing,et al.  High-Level Synthesis Challenges and Solutions for a Dynamically Reconfigurable Processor , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.

[5]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[6]  Bruce A. Draper,et al.  Mapping a Single Assignment Programming Language to Reconfigurable Systems , 2002, The Journal of Supercomputing.

[7]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[8]  John Wawrzynek,et al.  The Garp Architecture and C Compiler , 2000, Computer.

[9]  Monica S. Lam,et al.  An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.

[10]  Monica S. Lam,et al.  Maximizing Parallelism and Minimizing Synchronization with Affine Partitions , 1998, Parallel Comput..

[11]  Maya Gokhale,et al.  NAPA C: compiling for a hybrid RISC/FPGA architecture , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[12]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[13]  P. Feautrier Parametric integer programming , 1988 .