Run-time reconfiguration of communication in SIMD architectures

SIMD processors are increasingly used in embedded systems for multi-media applications because of their area- and energy-efficiency. Communication between the processing elements (PEs) in an SIMD processor has remained a cause of inefficiency however; the SIMD concept prescribes that all PEs communicate in the same clock cycle. Existing SIMD architectures solve this problem either by multi-hop communication (causing cycle overhead), or by a fully connected communication network (causing area overhead). To solve the communication bottleneck, we propose a reconfigurable SIMD architecture (RC-SIMD) with a set of delay-lines in the instruction bus, distributing the accesses to the communication network over time. We can (re-) configure the size and number of delay-lines, a specific configuration representing a trade-off between the number of clock cycles and the length of a clock period. Reconfiguration time is typically much less than 1% of the execution time of an algorithm, and the extra configuration hardware is less than 2%. Experiments show that our reconfigurable architecture achieves (on average) more than 10% performance improvement over a non-reconfigurable architecture