Design space exploration of an optimized compiler approach for a generic reconfigurable array architecture

Abstract Several mesh-like coarse-grained reconfigurable architectures have been devised in the last few years accompanied with their corresponding mapping flows. One of the major bottlenecks in mapping algorithms on these architectures is the limited memory access bandwidth. Only a few mapping methodologies encountered the problem of the limited bandwidth while none has explored how the performance improvements are affected, from the architectural characteristics. We study in this paper the impact that the architectural parameters have on performance speedups achieved when the PEs’ local RAMs are used for storing the variables with data reuse opportunities. The data reuse values are transferred in the internal interconnection network instead of being fetched, from external memories, in order to reduce the data transfer burden on the bus network. A novel mapping algorithm is also proposed that uses a list scheduling technique. The experimental results quantified the trade-offs that exist between the performance improvements and the memory access latency, the interconnection network and the processing element’s local RAM size. For this reason, our mapping methodology targets on a flexible architecture template, which permits such an exploration. More specifically, the experiments showed that the improvements increase with the memory access latency, while a richer interconnection topology can improve the operation parallelism by a factor of 1.4 on average. Finally, for the considered set of benchmarks, the operation parallelism has been improved from 8.6% to 85.1% from the application of our methodology, and by having each PE’s Local RAM a size of 8 words.

[1]  Kiyoung Choi,et al.  Compilation approach for coarse-grained reconfigurable architectures , 2003, IEEE Design & Test of Computers.

[2]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[3]  Rajesh Gupta,et al.  Network topology exploration of mesh-based coarse-grain reconfigurable architectures , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[4]  Wayne Luk,et al.  Reconfigurable computing: architectures and design methods , 2005 .

[5]  Alexandru Nicolau,et al.  Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration , 1998 .

[6]  Erik Brockmeyer,et al.  Data Access and Storage Management for Embedded Programmable Processors , 2002, Springer US.

[7]  Reiner W. Hartenstein,et al.  Design-Space Exploration of Low Power Coarse Grained Reconfigurable Datapath Array Architectures , 2000, PATMOS.

[8]  Steven J. E. Wilton,et al.  Register file architecture optimization in a coarse-grained reconfigurable architecture , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[9]  Shekhar Y. Borkar,et al.  Supporting systolic and memory communication in iWarp , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[10]  Patrice Quinton,et al.  Systolic algorithms and architectures , 1987 .

[11]  Fadi J. Kurdahi,et al.  Automatic compilation to a coarse-grained reconfigurable system-opn-chip , 2003, TECS.

[12]  Rudy Lauwereins,et al.  Architecture exploration for a reconfigurable architecture template , 2005, IEEE Design & Test of Computers.

[13]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[14]  David Shoemaker,et al.  NuMesh: An architecture optimized for scheduled communication , 2004, The Journal of Supercomputing.

[15]  Rudy Lauwereins,et al.  Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling , 2003, DATE.

[16]  Markus Weinhardt,et al.  XPP-VC: A C Compiler with Temporal Partitioning for the PACT-XPP Architecture , 2002, FPL.

[17]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[18]  Reiner W. Hartenstein,et al.  A decade of reconfigurable computing: a visionary retrospective , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[19]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[20]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[21]  Rudy Lauwereins,et al.  Design methodology for a tightly coupled VLIW/reconfigurable matrix architecture: a case study , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[22]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[23]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[24]  Kunle Olukotun,et al.  A quantitative analysis of reconfigurable coprocessors for multimedia applications , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[25]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[26]  Alexandru Nicolau,et al.  Memory Issues in Embedded Systems-on-Chip , 1999 .

[27]  Nikil D. Dutt,et al.  Interconnect-Aware Mapping of Applications to Coarse-Grain Reconfigurable Architectures , 2004, FPL.

[28]  R. Hartenstein,et al.  A datapath synthesis system for the reconfigurable datapath architecture , 1995, Proceedings of ASP-DAC'95/CHDL'95/VLSI'95 with EDA Technofair.

[29]  Kunle Olukotun,et al.  REMARC : Reconfigurable Multimedia Array Coprocessor , 1999 .