Parallel architecture of power-of-two multipliers for FPGAs

This research work presents a novel approach to design efficient power-of-two multipliers on modern field-programmable gate arrays (FPGA) devices. Several ways of exploiting fixed-point power-of-two multiplications have been recently demonstrated to reduce the computational complexity of several computationally intensive applications, such as computer vision, deep learning, and many others. Modern FPGA devices provide speed-optimised intellectual property (IP) cores based on embedded modules, such as digital signal processing blocks, and area-optimised IP cores based on reconfigurable logic resources, such as look-up tables and flip-flops. Unfortunately, due either to their limited available amount or to their limited running frequency, these IP cores do not allow the overall computational capability offered by an FPGA device to be completely exploited. While the speed-optimised version of the multiplier proposed here is fast enough to increase the number of operations performed per second by up to 4.3 times, with respect to the conventional designs, its area-optimised implementation reduces resources requirements and energy consumption by up to 22 and 40%.

[1]  Natalie D. Enright Jerger,et al.  Proteus: Exploiting precision variability in deep neural networks , 2017, Parallel Comput..

[2]  Christophe Garcia,et al.  Low-Complexity Approximate Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Weiwei Zhang,et al.  Real-time vehicle type classification with deep convolutional neural networks , 2017, Journal of Real-Time Image Processing.

[4]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Francisco Rodríguez-Henríquez,et al.  Constant-time hardware computation of elliptic curve scalar multiplication around the 128 bit security level , 2018, Microprocess. Microsystems.

[6]  Soheil Ghiasi,et al.  Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Andrew D. Booth,et al.  A SIGNED BINARY MULTIPLICATION TECHNIQUE , 1951 .

[8]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[9]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[10]  Kathirvel Brindhadevi,et al.  Comparative analysis of various types of multipliers for effective low power , 2019 .

[11]  Martyn P. Nash,et al.  Suitability of recent hardware accelerators (DSPs, FPGAs, and GPUs) for computer vision and image processing algorithms , 2018, Signal Process. Image Commun..

[12]  Yen-Cheng Kuan,et al.  A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.