An algorithm for mapping loops onto coarse-grained reconfigurable architectures

With the increasing demand for flexible yet highly efficient architecture platforms for media applications, there is a growing interest in the Coarse-grained Reconfigurable Architectures (CRAs). While many CRAs have demonstrated impressive performance improvement, the lack of compilation technology for such architectures causes a bottleneck in the current design process. In this paper, we present a novel mapping algorithm designed to support Reconfigurable ALU Array (RAA) architectures, that represent a significant class of CRAs. More specifically we present a core mapping algorithm that addresses the problem of placing and routing the operations of a loop body onto the ALU array, to be executed in a loop pipelined fashion. Experimental results using our mapping algorithm on a typical RAA show that our algorithm not only has very fast compilation time but can also generate quality mappings exhibiting high memory bandwidth utilization and low global interconnection requirements. Comparison with manual mapping also indicates that our algorithm can generate near-optimal mappings for several loops.