Mapping of regular nested loop programs to coarse-grained reconfigurable arrays - constraints and methodology

Summary form only given. Apart from academic, recently more and more commercial coarse-grained reconfigurable arrays have been developed. Computational intensive applications from the area of video and wireless communication seek to exploit the computational power of such massively parallel SoCs. Conventionally, DSP processors are used in the digital signal processing domain. Thus, the existing compilation techniques are closely related to approaches from the DSP world. These approaches employ several loop transformations, like pipelining or temporal partitioning, but they are not able to exploit the full parallelism of a given algorithm and the computational potential of a typical 2-dimensional array. In this paper, (i) we present an overview of constraints which have to be considered when mapping applications to coarse-grained reconfigurable arrays, (ii) we present our design methodology for mapping regular algorithms onto massively parallel arrays which is characterized by loop parallelization in the polytope model, and (Hi), in a first case study, we adapt our design methodology for targeting reconfigurable arrays. The case study shows that the presented regular mapping methodology may lead to highly efficient implementations taking into account the constraints of the architecture.

[1]  Lothar Thiele,et al.  Resource constrained scheduling of uniform algorithms , 1993, J. VLSI Signal Process..

[2]  Wayne Luk,et al.  Pipeline vectorization , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[3]  BagherzadehNader,et al.  Automatic compilation to a coarse-grained reconfigurable system-opn-chip , 2003 .

[4]  Kiyoung Choi,et al.  An algorithm for mapping loops onto coarse-grained reconfigurable architectures , 2003, LCTES '03.

[5]  André DeHon,et al.  Reconfigurable architectures for general-purpose computing , 1996 .

[6]  Markus Weinhardt,et al.  PACT XPP—A Self-Reconfigurable Data Processing Architecture , 2004, The Journal of Supercomputing.

[7]  Yanbing Li,et al.  Hardware-software co-design of embedded reconfigurable architectures , 2000, DAC.

[8]  Jürgen Teich,et al.  Scheduling of partitioned regular algorithms on processor arrays with constrained resources , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.

[9]  Richard C. Dorf,et al.  Field-Programmable Gate Arrays: Reconfigurable Logic for Rapid Prototyping and Implementation of Digital Systems , 1995 .

[10]  Frank Hannig,et al.  Energy Estimation And Optimization For Piecewise Regular Processor Arrays , 2003 .

[11]  Fadi J. Kurdahi,et al.  Automatic compilation to a coarse-grained reconfigurable system-opn-chip , 2003, TECS.

[12]  Majid Sarrafzadeh,et al.  A quick safari through the reconfiguration jungle , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[13]  Reiner W. Hartenstein,et al.  A decade of reconfigurable computing: a visionary retrospective , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[14]  Jürgen Teich,et al.  Design Space Exploration for Massively Parallel Processor Arrays , 2001, PaCT.

[15]  Jürgen Teich,et al.  Energy estimation of nested loop programs , 2002, SPAA '02.

[16]  Jürgen Teich,et al.  Synthesis of FPGA Implementations from Loop Algorithms , 2001 .

[17]  Jürgen Teich,et al.  Exact Partitioning of Affine Dependence Algorithms , 2002, Embedded Processor Design Challenges.

[18]  Paul Feautrier,et al.  Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.

[19]  Jürgen Teich,et al.  Partitioning Processor Arrays under Resource Constraints , 1997, J. VLSI Signal Process..

[20]  Ed F. Deprettere,et al.  Domain-Specific Processors : Systems, Architectures, Modeling, and Simulation , 2003 .

[21]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.