Achieving Performance Speed-up in FPGA Based Bit-Parallel Multipliers using Embedded Primitive and Macro support

Modern Field Programmable Gate Arrays (FPGA) are fast moving into the consumer market and their domain has expanded from prototype designing to low and medium volume productions. FPGAs are proving to be an attractive replacement for Application Specific Integrated Circuits (ASIC) primarily because of the low Non-recurring Engineering (NRE) costs associated with FPGA platforms. This has prompted FPGA vendors to improve the capacity and flexibility of the underlying primitive fabric and include specialized macro support and intellectual property (IP) cores in their offerings. However, most of the work related to FPGA implementations does not take full advantage of these offerings. This is primarily because designers rely mainly on the technology-independent optimization to enhance the performance of the system and completely neglect the speed-up that is achievable using these embedded primitives and macro support. In this paper, we consider the technology-dependent optimization of fixed-point bit-parallel multipliers by carrying out their implementations using embedded primitives and macro support that are inherent in modern day FPGAs. Our implementation targets three different FPGA families viz. Spartan-6, Virtex-4 and Virtex-5. The implementation results indicate that a considerable speed up in performance is achievable using these embedded FPGA resources.

[1]  Béla Almási,et al.  A simple solution for wireless network layer roaming problems , 2012 .

[2]  Steve Kilts Control System Components , 2008 .

[3]  Arto Salomaa,et al.  Public-Key Cryptography , 1991, EATCS Monographs on Theoretical Computer Science.

[4]  Oscar Gustafsson,et al.  Using DSP block pre-adders in pipeline SDF FFT implementations in contemporary FPGAs , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[5]  Jiun-In Guo,et al.  A Versatile Multimedia Functional Unit Design Using the Spurious Power Suppression Technique , 2006, 2006 IEEE Asian Solid-State Circuits Conference.

[6]  Michael Bredel,et al.  OLiMPS. OpenFlow Link-layer MultiPath Switching , 2014 .

[7]  Béla Almási,et al.  Throughput performance analysis of the multipath communication library MPT , 2013, 2013 36th International Conference on Telecommunications and Signal Processing (TSP).

[8]  G. Lencse Testing the Channel Aggregation Capability of the MPT Multipath Communication Library , 2014 .

[9]  Mark Handley,et al.  TCP Extensions for Multipath Operation with Multiple Addresses , 2020, RFC.

[10]  Bin Zhang,et al.  Distinguishing Attacks on RC4 and A New Improvement of the Cipher , 2013, IACR Cryptol. ePrint Arch..

[11]  Roger F. Woods,et al.  Multiplexer Based Reconfiguration for Virtex Multipliers , 2000, FPL.

[12]  Yuan Zhou,et al.  Advances in IEEE 802.11ah standardization for machine-type communications in sub-1GHz WLAN , 2013, 2013 IEEE International Conference on Communications Workshops (ICC).

[13]  Frank Vahid,et al.  Energy savings and speedups from partitioning critical software loops to hardware in embedded systems , 2004, TECS.

[14]  Dan Pei,et al.  WWW 2009 MADRID! Track: Performance, Scalability and Availability / Session: Performance Network-Aware Forward Caching , 2022 .

[15]  Florent de Dinechin,et al.  Constant Multipliers for FPGAs , 2000, PDPTA.

[16]  Roger Woods,et al.  FPGA-based Implementation of Signal Processing Systems , 2017 .

[17]  Miroslav Popovic,et al.  MPTCP Is Not Pareto-Optimal: Performance Issues and a Possible Solution , 2013, IEEE/ACM Transactions on Networking.

[18]  Stefano Squartini,et al.  Evaluation of the Wireless M-Bus standard for future smart water grids , 2013, 2013 9th International Wireless Communications and Mobile Computing Conference (IWCMC).

[19]  Chip-Hong Chang,et al.  A Low Error and High Performance Multiplexer-Based Truncated Multiplier , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[20]  Roger F. Woods,et al.  An investigation of reconfigurable multipliers for use in adaptive signal processing , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[21]  Gang Zhou,et al.  Area optimization of bit parallel finite field multipliers with fast carry logic on FPGAS , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[22]  Andreas Antoniou,et al.  Area-efficient multipliers for digital signal processing applications , 1996 .

[23]  Abbes Amira,et al.  FPGA Realization of FIR Filters by Efficient and Flexible Systolization Using Distributed Arithmetic , 2008, IEEE Transactions on Signal Processing.

[24]  Javier Valls,et al.  Efficient FPGA-implementation of two's complement digit-serial/parallel multipliers , 2003, IEEE Trans. Circuits Syst. II Express Briefs.

[25]  Seetharaman Ramachandran Digital VLSI Systems Design: A Design Manual for Implementation of Projects on FPGAs and ASICs Using Verilog , 2007 .

[26]  S Shanthala,et al.  VLSI Design and Implementation of Low Power MAC Unit with Block Enabling Technique , 2009 .

[27]  Jaume Barceló,et al.  IEEE 802.11AH: the WiFi approach for M2M communications , 2014, IEEE Wireless Communications.

[28]  H. I. Saleh,et al.  An FPGA implementation guide for some different types of serial–parallel multiplier structures , 2000 .

[29]  Bruce A. Wooley,et al.  A Two's Complement Parallel Array Multiplication Algorithm , 1973, IEEE Transactions on Computers.

[30]  Scott Hauck,et al.  Reconfigurable computing: a survey of systems and software , 2002, CSUR.

[32]  Wayne Luk,et al.  Reconfigurable computing: architectures and design methods , 2005 .

[33]  Graham A. Jullien,et al.  A New Design Technique for Column Compression Multipliers , 1995, IEEE Trans. Computers.

[34]  E.E. Swartzlander,et al.  Truncated Multiplications for the Negative Two's Complement Number System , 2006, 2006 49th IEEE International Midwest Symposium on Circuits and Systems.

[35]  Béla Almási,et al.  A solution for changing the communication interfaces between WiFi and 3G without packet loss , 2015, 2015 38th International Conference on Telecommunications and Signal Processing (TSP).

[36]  James E. Stine,et al.  Variations on truncated multiplication , 2003, Euromicro Symposium on Digital System Design, 2003. Proceedings..

[37]  Gang Quan,et al.  High-level synthesis for large bit-width multipliers on FPGAs: a case study , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[38]  Mark Shand,et al.  Hardware speedups in long integer multiplication , 1991, SIGARCH Comput. Archit. News.

[39]  Todd A. Cook,et al.  Implementation of IEEE single precision floating point addition and multiplication on FPGAs , 1996, 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[40]  Dhamin Al-Khalili,et al.  Efficient Scheme for Implementing Large Size Signed Multipliers Using Multigranular Embedded DSP Blocks in FPGAs , 2009, Int. J. Reconfigurable Comput..

[41]  Keshab K. Parhi,et al.  VLSI digital signal processing systems , 1999 .

[42]  Milos D. Ercegovac,et al.  Fast on-line multiplication units using LSA organization , 1999, Optics & Photonics.

[43]  Christopher S. Wallace,et al.  A Suggestion for a Fast Multiplier , 1964, IEEE Trans. Electron. Comput..

[44]  Milos D. Ercegovac,et al.  High-level optimization techniques for low-power multiplier design , 2003 .

[45]  Anja Feldmann,et al.  On dominant characteristics of residential broadband internet traffic , 2009, IMC '09.

[46]  Frank Vahid,et al.  A quantitative analysis of the speedup factors of FPGAs over processors , 2004, FPGA '04.

[47]  Marc Blanchet,et al.  Internet Engineering Task Force (ietf) Multiple Interfaces and Provisioning Domains Problem Statement , 2022 .

[48]  Karl S. Hemmert,et al.  Fast, Efficient Floating-Point Adders and Multipliers for FPGAs , 2010, TRETS.

[49]  Farnam Jahanian,et al.  Internet inter-domain traffic , 2010, SIGCOMM '10.

[50]  Keith D. Underwood,et al.  FPGAs vs. CPUs: trends in peak floating-point performance , 2004, FPGA '04.

[51]  Bela Almasi Multipath communication — A new basis for the Future Internet Cognitive Infocommunication , 2013, 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom).