Binarized Encoder-Decoder Network and Binarized Deconvolution Engine for Semantic Segmentation

Recently, semantic segmentation based on deep neural network (DNN) has attracted attention as it exhibits high accuracy, and many studies have been conducted on this. However, DNN-based segmentation studies focused mainly on improving accuracy, thus greatly increasing the computational demand and memory footprint of the segmentation network. For this reason, the segmentation network requires a lot of hardware resources and power consumption, and it is difficult to be applied to an environment where they are limited, such as an embedded system. In this paper, we propose a binarized encoder-decoder network (<italic>BEDN</italic>) and a binarized deconvolution engine (<italic>BiDE</italic>) accelerating the network to realize low-power, real-time semantic segmentation. <italic>BiDE</italic> implements a binarized segmentation network with custom hardware, greatly reducing the hardware resource usage and greatly increasing the throughput of network implementation. The deconvolution used for upsampling in a segmentation network includes zero padding. In order to enable deconvolution in a binarized segmentation network that cannot express zero, we introduce <italic>zero-aware binarized deconvolution</italic> which skips padded zero activations and <italic>zero-aware batch normalization embedded binary activation</italic> considering zero-skipped convolution. The <italic>BEDN</italic>, which is a binarized segmentation network proposed to be accelerated on <italic>BiDE</italic>, has acceptable accuracy while greatly reducing the computational and memory demands of the segmentation network through full-binarization and simple structure. <italic>BEDN</italic> has a network size of 0.21 MB, and its maximum memory usage is 1.38 MB. <italic>BiDE</italic> was implemented on Xilinx ZU7EV field-programmable gate array (FPGA) to operate at 187.5 MHz. <italic>BiDE</italic> accelerated the proposed <italic>BEDN</italic> within CamVid11 images of <inline-formula> <tex-math notation="LaTeX">$480\times {360}$ </tex-math></inline-formula> size at 25.89 frames per second (FPS) achieving a performance of 1.682 Tera operations per second (TOPS) and 824 Giga operations per second per watt (GOPS/W).

[1]  Eriko Nurvitadhi,et al.  High performance binary neural networks on the Xeon+FPGA™ platform , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[2]  Junzhong Shen,et al.  Efficient Implementation of 2D and 3D Sparse Deconvolutional Neural Networks with a Uniform Architecture on FPGAs , 2019 .

[3]  Christian S. Perone,et al.  U-Net Fixed-Point Quantization for Medical Image Segmentation , 2019, LABELS/HAL-MICCAI/CuRIOUS@MICCAI.

[4]  Lin Yang,et al.  Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation , 2017, MICCAI.

[5]  Bing Li,et al.  RED: A ReRAM-Based Efficient Accelerator for Deconvolutional Computation , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[6]  Linda G. Shapiro,et al.  ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation , 2018, ECCV.

[7]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[9]  Shuchang Zhou,et al.  Training Bit Fully Convolutional Network for Fast Semantic Segmentation , 2016, ArXiv.

[10]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[11]  Swagath Venkataramani,et al.  Memory and Interconnect Optimizations for Peta-Scale Deep Learning Systems , 2019, 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC).

[12]  Kai Chen,et al.  PAI-FCNN: FPGA Based Inference System for Complex CNN Models , 2019, 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[13]  Wei Liu,et al.  Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.

[14]  Hiroki Nakahara,et al.  On-Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[15]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[16]  S. Dwivedi,et al.  Obesity May Be Bad: Compressed Convolutional Networks for Biomedical Image Segmentation , 2020 .

[17]  James T. Kwok,et al.  Loss-aware Binarization of Deep Networks , 2016, ICLR.

[18]  Jiwu Shu,et al.  LerGAN: A Zero-Free, Low Data Movement and PIM-Based GAN Architecture , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  Roberto Cipolla,et al.  Fast-SCNN: Fast Semantic Segmentation Network , 2019, BMVC.

[20]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[21]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[23]  Eriko Nurvitadhi,et al.  Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[24]  Wayne Luk,et al.  Towards an Efficient Accelerator for DNN-Based Remote Sensing Image Segmentation on FPGAs , 2019, 2019 29th International Conference on Field Programmable Logic and Applications (FPL).

[25]  Li Yang,et al.  A Fully Onchip Binarized Convolutional Neural Network FPGA Impelmentation with Accurate Inference , 2018, ISLPED.

[26]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Tadahiro Kuroda,et al.  BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W , 2018, IEEE Journal of Solid-State Circuits.

[28]  Xiaobo Sharon Hu,et al.  Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Luca Benini,et al.  XNORBIN: A 95 TOp/s/W hardware accelerator for binary convolutional neural networks , 2018, 2018 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS).

[30]  Sepp Hochreiter,et al.  Speeding up Semantic Segmentation for Autonomous Driving , 2016 .

[31]  Mouloud Belbahri,et al.  Regularized Binary Network Training , 2018 .

[32]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[33]  Rajesh Gupta,et al.  Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.

[34]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[35]  Wayne Luk,et al.  Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA , 2018, ACM Trans. Reconfigurable Technol. Syst..

[36]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[37]  Wayne Luk,et al.  FP-BNN: Binarized neural network on FPGA , 2018, Neurocomputing.

[38]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[39]  Tao Li,et al.  Towards Efficient Microarchitectural Design for Accelerating Unsupervised GAN-Based Deep Learning , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[40]  Manoj Alwani,et al.  Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[41]  Leibo Liu,et al.  GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[42]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Hoi-Jun Yoo,et al.  DT-CNN: Dilated and Transposed Convolution Neural Network Accelerator for Real-Time Image Segmentation on Mobile Devices , 2019, 2019 IEEE International Symposium on Circuits and Systems (ISCAS).

[45]  Bingsheng He,et al.  FCN-Engine: Accelerating Deconvolutional Layers in Classic CNN Processors , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[46]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.