SIMD processors are increasingly used in embedded systems for multi-media applications because of their area- and energy-efficiency. Communication between the processing elements (PEs) in an SIMD processor has remained a cause of inefficiency however; the SIMD concept prescribes that all PEs communicate in the same clock cycle. Existing SIMD architectures solve this problem either by multi-hop communication (causing cycle overhead), or by a fully connected communication network (causing area overhead). To solve the communication bottleneck, we propose a reconfigurable SIMD architecture (RC-SIMD) with a set of delay-lines in the instruction bus, distributing the accesses to the communication network over time. We can (re-) configure the size and number of delay-lines, a specific configuration representing a trade-off between the number of clock cycles and the length of a clock period. Reconfiguration time is typically much less than 1% of the execution time of an algorithm, and the extra configuration hardware is less than 2%. Experiments show that our reconfigurable architecture achieves (on average) more than 10% performance improvement over a non-reconfigurable architecture
[1]
Henk Corporaal,et al.
RC-SIMD: Reconfigurable communication SIMD architecture for image processing applications
,
2006,
J. Embed. Comput..
[2]
Qin Zhao,et al.
Constraint analysis for code generation: basic techniques and applications in FACTS
,
2000,
TODE.
[3]
H. Peter Hofstee,et al.
Power efficient processor architecture and the cell processor
,
2005,
11th International Symposium on High-Performance Computer Architecture.
[4]
Henk Corporaal,et al.
Designing Area and Performance Constrained SIMD/VLIW Image Processing Architectures
,
2005,
ACIVS.
[5]
Henk Corporaal,et al.
Benchmarks for SmartCam Development.
,
2003
.
[6]
Honglin Wu,et al.
Power complexity of multiplexer-based optoelectronic crossbar switches
,
2005,
IEEE Transactions on Very Large Scale Integration (VLSI) Systems.