Design space exploration for field programmable compressor trees

The Field Programmable Compressor Tree (FPCT) is a programmable compressor tree (e.g., a Wallace or Dadda Tree) intended for integration in an FPGA or other reconfigurable device. This paper presents a design space exploration (DSE) method that can be used to identify the best FPCT architecture for a given set of arithmetic benchmark circuits; in practice, an FPGA vendor can use the design space exploration to tailor the FPCT to meet the needs of the most important benchmark circuits of the vendor's largest-volume clients. One novel feature of the DSE is the introduction of a metric called I/O utilization; we found that I/O utilization has a strong correlation with both the critical path delay and area of the benchmark circuits under study. Pruning the search space using I/O utilization allowed us to reduce significantly the number of FPCTs that must be synthesized and evaluated during the DSE, while giving high confidence that the best architectures are still explored. The DSE was applied to seven small-to-medium range benchmark circuits; one FPCT architecture was found that was 30% faster than the second best in terms of critical path delay, and only 3.34% larger than the smallest.

[1]  Vaughn Betz,et al.  The Stratix II logic and routing architecture , 2005, FPGA '05.

[2]  Ryan Kastner,et al.  High speed FIR filter implementation using add and shift method , 2006, FPGA '06.

[3]  Paolo Ienne,et al.  Automatic Synthesis of Compressor Trees: Reevaluating Large Counters , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[4]  Jonathan Rose,et al.  Area and delay trade-offs in the circuit and architecture design of FPGAs , 2008, FPGA '08.

[5]  Vaughn Betz,et al.  Architecture and CAD for Deep-Submicron FPGAS , 1999, The Springer International Series in Engineering and Computer Science.

[6]  Jonathan Rose,et al.  Measuring the Gap Between FPGAs and ASICs , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Paolo Ienne,et al.  Architectural improvements for field programmable counter arrays: enabling efficient synthesis of fast compressor trees on FPGAs , 2008, FPGA '08.

[8]  Jonathan Rose,et al.  Using bus-based connections to improve field-programmable gate-array density for implementing datapath circuits , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  Alan Gatherer,et al.  A 64 channel programmable receiver chip for 3G wireless infrastructure , 2005, Proceedings of the IEEE 2005 Custom Integrated Circuits Conference, 2005..

[10]  Scott A. Mahlke,et al.  Exploring the design space of LUT-based transparent accelerators , 2005, CASES '05.

[11]  Christopher S. Wallace,et al.  A Suggestion for a Fast Multiplier , 1964, IEEE Trans. Electron. Comput..

[12]  Paolo Ienne,et al.  Improving Synthesis of Compressor Trees on FPGAs via Integer Linear Programming , 2008, 2008 Design, Automation and Test in Europe.

[13]  Paolo Ienne,et al.  Improved use of the carry-save representation for the synthesis of complex arithmetic circuits , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[14]  Paolo Bonzini,et al.  Design and Architectural Exploration of Expression-Grained Reconfigurable Arrays , 2008, 2008 Symposium on Application Specific Processors.

[15]  Vaughn Betz,et al.  VPR: A new packing, placement and routing tool for FPGA research , 1997, FPL.

[16]  Paolo Ienne,et al.  Enhancing FPGA Performance for Arithmetic Circuits , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[17]  Scott Hauck,et al.  The Chimaera reconfigurable functional unit , 1997, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[18]  J. Rose,et al.  The effect of LUT and cluster size on deep-submicron FPGA performance and density , 2000, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[19]  Paolo Ienne,et al.  Efficient synthesis of compressor trees on FPGAs , 2008, 2008 Asia and South Pacific Design Automation Conference.

[20]  William J. Kubitz,et al.  A Compact High-Speed Parallel Multiplication Scheme , 1977, IEEE Transactions on Computers.

[21]  Liang-Gee Chen,et al.  Analysis and architecture design of variable block-size motion estimation for H.264/AVC , 2006, IEEE Transactions on Circuits and Systems I: Regular Papers.