A unified evaluation framework for coarse grained reconfigurable array architectures

The efficiency of a coarse grained reconfigurable array architecture in terms of performance and hardware cost is hard to be determined. The large number of parameters that define an architecture instance and the mapping complexity makes the evaluation extremely difficult to accomplish without tool assistance. This paper investigates the four factors that are directly related with the efficiency of these architectures namely; the area, the clock frequency, the scheduling efficiency and performance. A unified exploration framework has been build for estimating the values of the 4 aforementioned factors for different architecture alternatives. The exploration framework consists of two parts: a) an existing retargetable compiler from which the mapping efficiency is estimated and b) from the parametric realization of the coarse grained reconfigurable array in hardware description language (VHDL). The latter is used for the estimation of the area and clock frequency of each architecture instance with the realization of the system in the 0.13¼m process of ASIC technology. Also, the experiments refer to different architecture instances in terms of the processing elements. interconnection network, the register files. size, their number of input output ports, and finally the available bandwidth. Totally 72 architecture scenarios have been studied revealing how each characteristic influences performance and area for efficiently make design decisions.

[1]  Michalis D. Galanis,et al.  Exploring the design space of an optimized compiler approach for mesh-like coarse-grained reconfigurable architectures , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[2]  Hugo De Man,et al.  Formalized methodology for data reuse: exploration for low-power hierarchical memory mappings , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[3]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[4]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[5]  Rudy Lauwereins,et al.  Design methodology for a tightly coupled VLIW/reconfigurable matrix architecture: a case study , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[6]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[7]  Michalis D. Galanis,et al.  Partitioning Methodology for Heterogeneous Reconfigurable Functional Units , 2006, The Journal of Supercomputing.

[8]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[9]  Rudy Lauwereins,et al.  Architecture exploration for a reconfigurable architecture template , 2005, IEEE Design & Test of Computers.

[10]  Behrooz Parhami,et al.  Computer arithmetic - algorithms and hardware designs , 1999 .

[11]  Javier Zalamea,et al.  Register constrained modulo scheduling , 2004, IEEE Transactions on Parallel and Distributed Systems.

[12]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[13]  Alexandru Nicolau,et al.  Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration , 1998 .

[14]  Carl Ebeling,et al.  Implementing an OFDM Receiver on the RaPiD Reconfigurable Architecture , 2004, IEEE Trans. Computers.

[15]  Reiner W. Hartenstein,et al.  Design-Space Exploration of Low Power Coarse Grained Reconfigurable Datapath Array Architectures , 2000, PATMOS.

[16]  Carl Ebeling,et al.  Implementing an OFDM receiver on the RaPiD reconfigurable architecture , 2003, IEEE Transactions on Computers.

[17]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[18]  Erik Brockmeyer,et al.  Data and memory optimization techniques for embedded systems , 2001, TODE.

[19]  Kunle Olukotun,et al.  A quantitative analysis of reconfigurable coprocessors for multimedia applications , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[20]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[21]  Rajesh Gupta,et al.  Network topology exploration of mesh-based coarse-grain reconfigurable architectures , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[22]  B. Ramakrishna Rau,et al.  Register allocation for software pipelined loops , 1992, PLDI '92.

[23]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[24]  Horácio C. Neto,et al.  Data-Driven Regular Reconfigurable Arrays: Design Space Exploration and Mapping , 2005, SAMOS.

[25]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[26]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[27]  Steven J. E. Wilton,et al.  Register file architecture optimization in a coarse-grained reconfigurable architecture , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[28]  Michalis D. Galanis,et al.  Alleviating the data memory bandwidth bottleneck in coarse-grained reconfigurable arrays , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).

[29]  Reiner W. Hartenstein,et al.  A decade of reconfigurable computing: a visionary retrospective , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.