Reconfigurable hardware solution to parallel prefix computation

Abstract This paper presents the design and implementation of an efficient reconfigurable parallel prefix computation hardware on field-programmable gate arrays (FPGAs). The design is based on a pipelined dataflow algorithm, and control logic is added to reconfigure the system for arbitrary parallelism degree. The system receives multiple input streams of elements in parallel and produces output streams in parallel. It has an advantage of controlling the degree of parallelism explicitly at run time. The time complexity of the design is O(d+(N−d)/d), where d and N are parallelism degree and stream size, respectively. When the stream size is sufficiently larger than the initial trigger time of the pipeline (d), the time complexity becomes O(N/d). Unlike the prefix computation circuits found in the literature, the design is scalable for different problem sizes including unknown sized data. The design is modular based on a finite state machine, and implemented and tested for target FPGA devices Xilinx Spartan2S XC2S300EFT256-6Q and XC2S600EFG676-6.

[1]  K. M. George,et al.  Parallel history sensitive computations in dataflow architecture , 1996, Proceedings of 1996 IEEE Second International Conference on Algorithms and Architectures for Parallel Processing, ICA/sup 3/PP '96.

[2]  Edward A. Lee,et al.  Compile-time scheduling of dynamic constructs in dataflow program graphs , 1997 .

[3]  Joseph JáJá,et al.  Prefix computations on symmetric multiprocessors , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[4]  P. C. Reghu Raj,et al.  Design of a high speed string matching co-processor for NLP , 2003, 16th International Conference on VLSI Design, 2003. Proceedings..

[5]  Jin Hwan Park Reconfigurable parallel approximate string matching on FPGAs , 2005, 8th Euromicro Conference on Digital System Design (DSD'05).

[6]  Christoforos N. Hadjicostis,et al.  Coding Techniques for Fault-Tolerant Parallel Prefix Computations in Abelian Groups , 2004, Comput. J..

[7]  Martin C. Herbordt,et al.  Families of FPGA-based algorithms for approximate string matching , 2004, Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004..

[8]  Alexandru Nicolau,et al.  The Strict Time Lower Bound and Optimal Schedules for Parallel Prefix with Resource Constraints , 1996, IEEE Trans. Computers.

[9]  Elham Sahebkar Khorasani Algorithms Sequential & Parallel: A Unified Approach , 2007, Scalable Comput. Pract. Exp..

[10]  Yen-Chun Lin,et al.  A new approach to constructing optimal parallel prefix circuits with small depth , 2004, J. Parallel Distributed Comput..

[11]  Ranga Vemuri,et al.  A Portable Face Recognition System Using Reconfigurable Hardware , 2004, ERSA.

[12]  Allan Gottlieb,et al.  Highly parallel computing , 1989, Benjamin/Cummings Series in computer science and engineering.

[13]  Richard Cole,et al.  Faster Optimal Parallel Prefix Sums and List Ranking , 2011, Inf. Comput..

[14]  Yen-Chun Lin,et al.  Efficient parallel prefix algorithms on fully connected message-passing computers , 1996, Proceedings of 3rd International Conference on High Performance Computing (HiPC).

[15]  Sanguthevar Rajasekaran,et al.  Optimal and Sublogarithmic Time Randomized Parallel Sorting Algorithms , 1989, SIAM J. Comput..

[16]  Julien Bernard,et al.  On-Line Adaptive Parallel Prefix Computation , 2006, Euro-Par.

[17]  Jin Hwan Park An Efficient Hardware Algorithm for Parallel Prefix Computation with Resource Constraints , 2000, PDPTA.

[18]  Daniel Gajski,et al.  A Heuristic for Suffix Solutions , 1986, IEEE Transactions on Computers.

[19]  Hung-Chi Su,et al.  A Parallel Algorithm for Finding All Successive Minimal Maximum Subsequences , 2006, LATIN.

[20]  Behrooz Parhami,et al.  Introduction to Parallel Processing: Algorithms and Architectures , 1999 .

[21]  Viktor K. Prasanna,et al.  Computing Lennard-Jones Potentials and Forces with Reconfigurable Hardware , 2004, ERSA.

[22]  Prasanta K. Jana,et al.  Parallel prefix computation on extended multi-mesh network , 2002, Inf. Process. Lett..

[23]  Giorgos Dimitrakopoulos,et al.  High-speed parallel-prefix VLSI Ling adders , 2005, IEEE Transactions on Computers.

[24]  V. Kamakoti,et al.  Efficient algorithms for prefix and general prefix computations on distributed shared memory systems with applications , 1997, Proceedings 1997 International Conference on Parallel and Distributed Systems.

[25]  Selim G. Akl Parallel computation: models and methods , 1997 .

[26]  Rajesh K. Mansharamani Parallel Computing Using the Prefix Problem , 1995 .

[27]  P. Ragde The parallel simplicity of compaction and chaining , 1990 .

[28]  Viktor K. Prasanna,et al.  Scalable and modular algorithms for floating-point matrix multiplication on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[29]  Cheng-Chew Lim,et al.  Parallel prefix adder design , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[30]  Ronald L. Graham,et al.  On the construction of zero-deficiency parallel prefix circuits with minimum depth , 2006, TODE.

[31]  Yen-Chun Lin,et al.  Z4: A New Depth-Size Optimal Parallel Prefix Circuit With Small Depth , 2003, Neural Parallel Sci. Comput..

[32]  Torben Hagerup The Parallel Complexity of Integer Prefix Summation , 1995, Inf. Process. Lett..

[33]  Faith E. Fich,et al.  New Bounds for Parallel Prefix Circuits , 1983, STOC.

[34]  Joseph JáJá,et al.  Prefix Computations on Symmetric Multiprocessors , 2001, J. Parallel Distributed Comput..