Parallel Processing on FPGAs: The Effect of Profiling on Performance

The processing elements, logic resources, and on-chip block RAMs of modern FPGAs can not only be used for prototyping custom hardware modules, but also for parallel processing purposes by implementing multiple processors for a single task. This paper compares the performance of a single-processor implementation with two types of dual-processor implementations for a widely used radix-2 n-point FFT algorithm (Kooley and Tuckey, 1965) in terms of processing speed and FPGA resource utilization. In the first dual-processor implementation, the partitioning is performed based on the computation complexity - O(nlog(n)) of the radix-2 FFT algorithm. In the second implementation, the partitioning is based on a detailed profiling procedure applied to each line of the code in the single-processor implementation. Results obtained show that the speedup of the first dual-processor implementation is on average 1.3times faster than the single-processor implementation, whereas the second dual-processor implementation is about 1.9times faster which is very close to the expected speedup. This result shows that detailed profiling is crucial in identifying the bottlenecks of an algorithm (i.e., all the factors are taken into consideration) and consequently the algorithm can be efficiently mapped on a multiprocessor system based on the correct decision

[1]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[2]  Paul R. Schumacher,et al.  A single program multiple data parallel processing platform for FPGAs , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[3]  Daniel Tabak High-performance RISC systems , 1989, Microprocess. Microsystems.

[4]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[5]  Reinhold Weicker,et al.  Dhrystone benchmark: rationale for version 2 and measurement rules , 1988, SIGP.

[6]  Kurt Keutzer,et al.  An FPGA-based soft multiprocessor system for IPv4 packet forwarding , 2005, International Conference on Field Programmable Logic and Applications, 2005..