Multiple data set reduction on FPGAs

Many scientific or engineering applications perform reduction of sets of sequential data streams. If the core operator of the reduction is deeply pipelined, dependencies between the input data elements cause data hazards in the pipeline. To tackle this problem, we propose a multiple set variable length reduction design with low latency and high pipeline utilization in this paper. We prove the buffer size and execution time bounds, and then show its performance on practical multiple data set scenarios. We apply the proposed method to the Householder QR decomposition and compare its performance to other methods with superior results. The proposed design is implemented on FPGAs with resource usage and performance presented.

[1]  Viktor K. Prasanna,et al.  High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs , 2007, IEEE Transactions on Parallel and Distributed Systems.

[2]  Viktor K. Prasanna,et al.  High-performance FPGA-based general reduction methods , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[3]  Kai Hwang,et al.  Vector-Reduction Techniques for Arithmetic Pipelines , 1985, IEEE Transactions on Computers.

[4]  Jack Dongarra,et al.  Enhancing Parallelism of Tile QR Factorization for Multicore Architectures , 2010 .

[5]  Henk J. Sips,et al.  An Improved Vector-Reduction Method , 1991, IEEE Trans. Computers.

[6]  Viktor K. Prasanna,et al.  High-performance and area-efficient reduction circuits on FPGAs , 2005, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05).

[7]  Jack Dongarra,et al.  Parallel tiled QR factorization for multicore architectures , 2008 .

[8]  Viktor K. Prasanna,et al.  Designing scalable FPGA-based reduction circuits using pipelined floating-point cores , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[9]  Viktor K. Prasanna,et al.  A Hybrid Approach for Mapping Conjugate Gradient onto an FPGA-Augmented Reconfigurable Supercomputer , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[10]  Viktor K. Prasanna,et al.  An FPGA-Based Application-Specific Processor for Efficient Reduction of Multiple Variable-Length Floating-Point Data Sets , 2006, IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06).