An optimization method of DMA transfer for a general purpose reconfigurable machine

DMA transfer between a CPU and an FPGA often becomes a bottleneck of current reconfigurable machines. The DMA transfer of the machines like SRC-6 supports streaming processing with on-board memory interleaving, but as a pre-processing of the interleaving, the CPU must reorder the data for applications with severe FPGA resource constraints. This paper empirically evaluates this overhead to reveal the trade-off point. The results show that a speedup is achieved by interleaved streaming DMA when 150 KB or lower data strings are transferred.

[1]  Oskar Mencer,et al.  ASC: a stream compiler for computing with FPGAs , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Sadaf R. Alam,et al.  Scientific Computing Beyond CPUs: FPGA implementations of common scientific kernels , 2005 .

[3]  Maya Gokhale,et al.  Promises and Pitfalls of Reconfigurable Supercomputing , 2006, ERSA.

[4]  Viktor K. Prasanna,et al.  Hardware/Software Approach to Molecular Dynamics on Reconfigurable Computers , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.