论文信息 - High-Performance Accurate and Approximate Multipliers for FPGA-Based Hardware Accelerators

High-Performance Accurate and Approximate Multipliers for FPGA-Based Hardware Accelerators

Multiplication is one of the widely used arithmetic operations in a variety of applications, such as image/video processing and machine learning. FPGA vendors provide high-performance multipliers in the form of DSP blocks. These multipliers are not only limited in number and have fixed locations on FPGAs but can also create additional routing delays and may prove inefficient for smaller bit-width multiplications. Therefore, FPGA vendors additionally provide optimized soft IP cores for multiplication. However, in this work, we advocate that these soft multiplier IP cores for FPGAs still need better designs to provide high-performance and resource efficiency. Toward this, we present generic area-optimized, low-latency accurate, and approximate softcore multiplier architectures, which exploit the underlying architectural features of FPGAs, i.e., lookup table (LUT) structures and fast-carry chains to reduce the overall critical path delay (CPD) and resource utilization of multipliers. Compared to Xilinx multiplier LogiCORE IP, our proposed unsigned and signed accurate architecture provides up to 25% and 53% reduction in LUT utilization, respectively, for different sizes of multipliers. Moreover, with our unsigned approximate multiplier architectures, a reduction of up to 51% in the CPD can be achieved with an insignificant loss in output accuracy when compared with the LogiCORE IP. For illustration, we have deployed the proposed multiplier architecture in accelerators used in image and video applications, and evaluated them for area and performance gains. Our library of accurate and approximate multipliers is opensource and available online at https://cfaed.tu-dresden.de/pd-downloads to fuel further research and development in this area, facilitate reproducible research, and thereby enabling a new research direction for the FPGA community.

[1] Salim Ullah,et al. Energy-Efficient Low-Latency Signed Multiplier for FPGA-Based Hardware Accelerators , 2021, IEEE Embedded Systems Letters.

[2] Florent de Dinechin,et al. Arithmetic core generation using bit heaps , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[3] Kartikeya Bhardwaj,et al. Power- and area-efficient Approximate Wallace Tree Multiplier for error-resilient systems , 2014, Fifteenth International Symposium on Quality Electronic Design.

[4] Christopher S. Wallace,et al. A Suggestion for a Fast Multiplier , 1964, IEEE Trans. Electron. Comput..

[5] Fabrizio Lombardi,et al. A low-power, high-performance approximate multiplier with configurable partial error recovery , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[6] Semeen Rehman,et al. Area-Optimized Accurate and Approximate Softcore Signed Multiplier Architectures , 2021, IEEE Transactions on Computers.

[7] Jean-Michel Muller,et al. Automatic Generation of Modular Multipliers for FPGA Applications , 2008, IEEE Transactions on Computers.

[8] Peter Zipf,et al. An Efficient Softcore Multiplier Architecture for Xilinx FPGAs , 2015, 2015 IEEE 22nd Symposium on Computer Arithmetic.

[9] Kaushik Roy,et al. Analysis and characterization of inherent application resilience for approximate computing , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[10] Paolo Ienne,et al. Exploiting fast carry-chains of FPGAs for designing compressor trees , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[11] Peter Zipf,et al. Resource Optimal Design of Large Multipliers for FPGAs , 2017, 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH).

[12] Andrew D. Booth,et al. A SIGNED BINARY MULTIPLICATION TECHNIQUE , 1951 .

[13] Sparsh Mittal,et al. A Survey of Techniques for Approximate Computing , 2016, ACM Comput. Surv..

[14] Aydin Emre Guzel,et al. Fast Multiplier Generator for FPGAs with LUT based Partial Product Generation and Column/row Compression , 2017, Integr..

[15] Kaushik Roy,et al. IMPACT: IMPrecise adders for low-power approximate computing , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[16] Andrew B. Kahng,et al. Accuracy-configurable adder for approximate arithmetic designs , 2012, DAC Design Automation Conference 2012.

[17] Lukás Sekanina,et al. EvoApproxSb: Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[18] Behrooz Parhami,et al. Computer arithmetic - algorithms and hardware designs , 1999 .

[19] Jonathan Rose,et al. Measuring the Gap Between FPGAs and ASICs , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[20] Kaushik Roy,et al. Low-Power Digital Signal Processing Using Approximate Adders , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[21] Semeen Rehman,et al. Architectural-space exploration of approximate multipliers , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[22] Lothar Thiele,et al. The Hypervolume Indicator Revisited: On the Design of Pareto-compliant Indicators Via Weighted Integration , 2007, EMO.

[23] Philip E. Madrid,et al. A fast hybrid multiplier combining Booth and Wallace/Dadda algorithms , 1992, [1992] Proceedings of the 35th Midwest Symposium on Circuits and Systems.

[24] Ing-Chao Lin,et al. High accuracy approximate multiplier with error correction , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[25] Bruce A. Wooley,et al. A Two's Complement Parallel Array Multiplication Algorithm , 1973, IEEE Transactions on Computers.

[26] Paolo Ienne,et al. Measuring and Reducing the Performance Gap between Embedded and Soft Multipliers on FPGAs , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[27] Sherief Reda,et al. DRUM: A Dynamic Range Unbiased Multiplier for approximate applications , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[28] Puneet Gupta,et al. Trading Accuracy for Power with an Underdesigned Multiplier Architecture , 2011, 2011 24th Internatioal Conference on VLSI Design.

[29] E. George Walters. Array Multipliers for High Throughput in Xilinx FPGAs with 6-Input LUTs , 2016, Comput..

[30] Jens Willkomm,et al. Multiple constant multiplication with ternary adders , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[31] Muhammad Shafique,et al. A low latency generic accuracy configurable adder , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[32] Paolo Ienne,et al. Variable Latency Speculative Addition: A New Paradigm for Arithmetic Circuit Design , 2008, 2008 Design, Automation and Test in Europe.

[33] Muhammad Shafique,et al. Area-Optimized Low-Latency Approximate Multipliers for FPGA-based Hardware Accelerators , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[34] R. Priyanka,et al. Study of approximate compressors for multiplication using FPGA , 2015, 2015 Online International Conference on Green Engineering and Technologies (IC-GET).