C-MAP: Improving the Effectiveness of Mapping Method for CGRA by Reducing NoC Congestion

The Coarse-Grained Reconfigurable Architecture (CGRA) is considered as one of the most potential candidates for big data applications, which provides significant throughput improvement and high energy efficiency. Unlike the dynamic issue superscalar method in conventional processors, the CGRA architecture uses the static placement dynamic issue (SPDI) execution method in which the compiler decides how to map the instructions onto the distributed processing elements (PEs) and the PEs executes one instruction when the required data is ready. Since the dataflow of the program is determined in the logical view, an improper mapping of instructions may leads to more network congestion and hurts the performance. Furthermore, the exploration for most optimized mapping in CGRA is proved to a NPC problem and can hardly be achieved in limited time. In this paper, we propose a novel mapping algorithm named Congestion-MAP (C-MAP). C-Map improves the effectiveness of CGRA mapping in the perspective of reducing network congestion and enhancing the continuity of the data-flow. Furthermore, C-Map also accelerates the mapping optimization for CGRA by using network analysis method, which supports the fast comparison of mapping plan and parallel exploration. Additionally, with C-Map, we also analyze the impact of several key considerations in CGRA instruction mapping, such as NoC workload reduction and workload balance. The experiment result shows that C-Map improves the performance by 2.2× as a geometric mean.

[1]  Bruce M. Maggs,et al.  Fast Algorithms for Finding O(Congestion + Dilation) Packet Routing Schedules , 1999, Comb..

[2]  Mingzhe Zhang,et al.  COMRANCE: A rapid method for Network-on-Chip design space exploration , 2016, 2016 Seventh International Green and Sustainable Computing Conference (IGSC).

[3]  Muhammad Shafique,et al.  PX-CGRA: Polymorphic approximate coarse-grained reconfigurable architecture , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[4]  Tulika Mitra,et al.  Dnestmap: mapping deeply-nested loops on ultra-low power CGRAs , 2018, DAC.

[5]  Zhigang Mao,et al.  Resource-saving compile flow for coarse-grained reconfigurable architectures , 2015, 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[6]  Ashish Goel,et al.  Source routing and scheduling in packet networks , 2005, JACM.

[7]  Allan Borodin,et al.  Adversarial queuing theory , 2001, JACM.

[8]  Aviral Shrivastava,et al.  REGIMap: Register-aware application mapping on Coarse-Grained Reconfigurable Architectures (CGRAs) , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  Christian Hochberger,et al.  A Near Optimal Integrated Solution for Resource Constrained Scheduling, Binding and Routing on CGRAs , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[10]  Dongrui Fan,et al.  SmarCo: An Efficient Many-Core Processor for High-Throughput Applications in Datacenters , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[11]  Aviral Shrivastava,et al.  RAMP: Resource-Aware Mapping for CGRAs , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[12]  Dongrui Fan,et al.  SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[13]  S. Alexander Chin,et al.  An Architecture-Agnostic Integer Linear Programming Approach to CGRA Mapping , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[14]  Leibo Liu,et al.  Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[15]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture , 2003, IEEE Micro.

[16]  Aviral Shrivastava,et al.  Branch-aware loop mapping on CGRAs , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[17]  Bruce M. Maggs,et al.  Packet routing and job-shop scheduling inO(congestion+dilation) steps , 1994, Comb..

[18]  Zhigang Mao,et al.  A static-placement, dynamic-issue framework for CGRA loop accelerator , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.