Optimizing Spatial Mapping of Nested Loop for

Coarse-grained reconfigurable architectures (CGRAs) have drawn increasing attention due to their flexibility and efficiency. Loops in applications are often mapped onto CGRAs for acceleration, and the mapping of loops onto CGRA is quite a challenging work due to the parallel execution paradigm and constrained hardware resource. To map loops onto CGRAs efficiently, it is important to transform loops into pieces that obey hardware resource constraints with less overhead (e.g., communication and configuration overhead). In this paper, we tackle this problem by establishing a performance optimization problem, including loop transformation and back- end placing and routing. A novel searching strategy is also designed to find the optimal result efficiently. Finally, we built a complete flow of mapping loop nests onto CGRA. Experiment results on most kernels of the Polybench show that our proposed approach can improve the performance of the kernels by 42% on average, as compared with the state-of-the-art methods. The runtime complexity of our approach is also acceptable.

[1]  Min Li,et al.  Scalable Block-Based Parallel Lattice Reduction Algorithm for an SDR Baseband Processor , 2011, 2011 IEEE International Conference on Communications (ICC).

[2]  David Parello,et al.  Facilitating the search for compositions of program transformations , 2005, ICS '05.

[3]  Kiamal Z. Pekmestzi,et al.  High Performance and Area Efficient Flexible DSP Datapath Synthesis , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Albert Cohen,et al.  Iterative optimization in the polyhedral model: part ii, multidimensional time , 2008, PLDI '08.

[5]  Eduardo Juárez Martínez,et al.  A DSP Based H.264 Decoder for a Multi-Format IP Set-Top Box , 2008, IEEE Transactions on Consumer Electronics.

[6]  Aviral Shrivastava,et al.  EPIMap: Using Epimorphism to map applications on CGRAs , 2012, DAC Design Automation Conference 2012.

[7]  Nikil D. Dutt,et al.  Integrated Kernel Partitioning and Scheduling for Coarse-Grained Reconfigurable Arrays , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Aviral Shrivastava,et al.  SPKM : A novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures , 2008, 2008 Asia and South Pacific Design Automation Conference.

[9]  Bjorn De Sutter,et al.  Implementation of a Coarse-Grained Reconfigurable Media Processor for AVC Decoder , 2008, J. Signal Process. Syst..

[10]  Spyros Tragoudas,et al.  A high-performance data path for synthesizing DSP kernels , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Leibo Liu,et al.  Mapping Optimization of Affine Loop Nests for Reconfigurable Computing Architecture , 2012, IEICE Trans. Inf. Syst..

[12]  Leibo Liu,et al.  Polyhedral model based mapping optimization of loop nests for CGRAs , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[13]  David H. Albonesi,et al.  ReMAP: A Reconfigurable Architecture for Chip Multiprocessors , 2011, IEEE Micro.

[14]  P. NirmalKumar,et al.  Coarse Grained ADRES Based MIMO-OFDM Transceiver with New Radix-$${2}^{5}$$25 Pipeline FFT/IFFT Processor , 2015, Circuits Syst. Signal Process..

[15]  Seth Copen Goldstein,et al.  PipeRench: A Reconfigurable Architecture and Compiler , 2000, Computer.

[16]  Christof Paar,et al.  An instruction-level distributed processor for symmetric-key cryptography , 2005, IEEE Transactions on Parallel and Distributed Systems.

[17]  Aviral Shrivastava,et al.  A Graph Drawing Based Spatial Mapping Algorithm for Coarse-Grained Reconfigurable Architectures , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[18]  Bjorn De Sutter,et al.  Architecture Enhancements for the ADRES Coarse-Grained Reconfigurable Array , 2008, HiPEAC.

[19]  Dong Wang,et al.  An energy-efficient coarse-grained dynamically reconfigurable fabric for multiple-standard video decoding applications , 2013, Proceedings of the IEEE 2013 Custom Integrated Circuits Conference.