HBUCNNA: Hybrid Binary-Unary Convolutional Neural Network Accelerator

Convolutional layers account for 90% of the total computational power of Convolutional Neural Networks (CNNs). Field programmable gate arrays (FPGAs) have shown great potential for accelerating inference tasks in CNNs. However, it is harder for FPGA platforms to deliver the best performance due to high computational power and high memory bandwidth requirements of today's CNNs. In this paper, we propose a reconfigurable parallel-pipelined Hybrid Binary-Unary CNN Accelerator (HBUCNNA) to implement low-cost, high-performance convolutional layers of a ResNet-18 architecture. We use the hybrid binary-unary method to implement banks of constant-coefficient multipliers, which are used to implement convolutional kernels. Moreover, we propose hybrid binary-unary batch normalization units to further improve the total hardware costs. These two units reduce {area, area×delay} costs by {50%, 30%} and {44%, 65%} on average compared to their conventional binary counterparts, respectively. The proposed accelerator stores control signals for reconfigurability instead of the numeric value of weights, which in turn reduces the memory footprint on average by 20%. Overall, the proposed HBUCNNA architecture reduces the {area, latency, power, energy, area×delay} costs on average by {25.5%, 40%, 15%, 40%, 47%} and {53%, 61%, 47%, 62%, 67%} compared to the constant-coefficient multiplier-based and variable size multiplier-based binary architectures, respectively. Moreover, the proposed accelerator improves the throughput by about 1.4 × compared to both of the mentioned architectures.