Computational acceleration methodologies: advantages of reconfigurable acceleration subsystems

Computational bottlenecks are endemic to digital signal processing (DSP) systems. Several approaches have been developed aiming to eliminate these bottlenecks in a cost- effective way. Application-specific integrated circuits (ASICs) effectively resolve specific bottlenecks, but for most applications they entail long development times and prohibitively high non-recurring engineering costs. Multi- processor architectures are cost-effective, but performance gains are at best linear with the number of processors. We show how properly targeted and designed reconfigurable acceleration subsystems (RASs), implemented using field- programmable gate arrays (FPGAs), can resolve computational bottlenecks in a cost-effective manner for a broad range of DSP applications. A model is proposed to quantify the benefits of computational acceleration on reconfigurable platforms and to determine which DSP applications are amenable to effective computational acceleration. The architecture and functionality of X-CIMTM, a reconfigurable acceleration subsystem recently introduced by MiroTech Microsystems, is described. X-CIM functions as a reconfigurable co-processor for TMS320C4x DSP processors, and impressive performance gains are reported. On a benchmark application consisting of a complex non-linear algorithm, TMS320C40 processed images in 6977ms working alone and in 182 ms when supported by an X-CIM co-processor.