A holistic approach for tightly coupled reconfigurable parallel processors

New standards in signal, multimedia, and network processing for embedded electronics are characterized by computationally intensive algorithms, high flexibility due to the swift change in specifications. In order to meet demanding challenges of increasing computational requirements and stringent constraints on area and power consumption in fields of embedded engineering, there is a gradual trend towards coarse-grained parallel embedded processors. Furthermore, such processors are enabled with dynamic reconfiguration features for supporting time- and space-multiplexed execution of the algorithms. However, the formidable problem in efficient mapping of applications (mostly loop algorithms) onto such architectures has been a hindrance in their mass acceptance. In this paper we present (a) a highly parameterizable, tightly coupled, and reconfigurable parallel processor architecture together with the corresponding power breakdown and reconfiguration time analysis of a case study application, (b) a retargetable methodology for mapping of loop algorithms, (c) a co-design framework for modeling, simulation, and programming of such architectures, and (d) loosely coupled communication with host processor.

[1]  Hideharu Amano,et al.  A Survey on Dynamically Reconfigurable Processors , 2006, IEICE Trans. Commun..

[2]  Ed F. Deprettere,et al.  Compaan: deriving process networks from Matlab for embedded signal processing architectures , 2000, CODES '00.

[3]  Jürgen Teich,et al.  MAML: An ADL for Designing Single and Multiprocessor Architectures , 2008 .

[4]  Markus Weinhardt,et al.  PACT XPP—A Self-Reconfigurable Data Processing Architecture , 2003, The Journal of Supercomputing.

[5]  Fadi J. Kurdahi,et al.  Automatic compilation to a coarse-grained reconfigurable system-opn-chip , 2003, TECS.

[6]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[7]  Jürgen Teich,et al.  Controller Synthesis for Mapping Partitioned Programs on Array Architectures , 2006, ARCS.

[8]  Bernard Pottier An Integrated Platform for Heterogeneous Reconfigurable Computing , 2007, ERSA.

[9]  Jürgen Teich,et al.  Efficient control generation for mapping nested loop programs onto processor arrays , 2007, J. Syst. Archit..

[10]  Paul Feautrier,et al.  Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.

[11]  Jürgen Teich,et al.  Efficient event-driven simulation of parallel processor architectures , 2007, SCOPES '07.

[12]  Jürgen Teich,et al.  A Dynamically Reconfigurable Weakly Programmable Processor Array Architecture Template , 2006, ReCoSoC.

[13]  Jürgen Teich,et al.  A highly parameterizable parallel processor array architecture , 2006, 2006 IEEE International Conference on Field Programmable Technology.

[14]  Jürgen Teich,et al.  Hierarchical Partitioning for Piecewise Linear Algorithms , 2006, International Symposium on Parallel Computing in Electrical Engineering (PARELEC'06).

[15]  Hideharu Amano,et al.  RoMultiC: fast and simple configuration data multicasting scheme for coarse grain reconfigurable devices , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..

[16]  Jürgen Teich,et al.  Power-Efficient Reconfiguration Control in Coarse-Grained Dynamically Reconfigurable Architectures , 2008, PATMOS.

[17]  Jürgen Teich,et al.  Mapping a class of dependence algorithms to coarse-grained reconfigurable arrays: architectural parameters and methodology , 2006, Int. J. Embed. Syst..

[18]  Scott A. Mahlke,et al.  High-level synthesis of nonprogrammable hardware accelerators , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[19]  Lothar Thiele,et al.  Resource constrained scheduling of uniform algorithms , 1993, J. VLSI Signal Process..

[20]  Jürgen Teich,et al.  Dynamic Piecewise Linear/Regular Algorithms , 2004 .

[21]  Christian Lengauer,et al.  Loop Parallelization in the Polytope Model , 1993, CONCUR.

[22]  Doran Wilde,et al.  Regular array synthesis using ALPHA , 1994, Proceedings of IEEE International Conference on Application Specific Array Processors (ASSAP'94).

[23]  Nikil D. Dutt,et al.  SPARK: a high-level synthesis framework for applying parallelizing compiler transformations , 2003, 16th International Conference on VLSI Design, 2003. Proceedings..

[24]  Rudy Lauwereins,et al.  ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.