A parallel arithmetic array for accelerating compute-intensive applications

A parallel arithmetic array processor for accelerating compute-intensive applications in low-power embedded systems is proposed in this study. The proposed flexible hardware architecture enables the fast execution of both control-dominated and compute-centric streaming computation tasks on the same array. Consequently, multiple levels of parallelism can be efficiently exploited. A test chip integrated with two 16×16 array processor cores was implemented in 65 nm CMOS technology. Multi-format video decoding algorithms were mapped on the chip as benchmarks. The proposed architecture achieved a notable 2.8× advantage on performance over an industrial coarse-grained array processor and a 66% performance boost over a state-of-the-art many-core processor. Meanwhile, the energy-efficiency was improved by 15.3× and 1.78×, respectively.