Scale-out Acceleration for 3D CNN-based Lung Nodule Segmentation on a Multi-FPGA System

Three-dimensional convolutional neural networks (3D CNNs) have become a promising method in lung nodule segmentation. The high computational complexity and memory requirements of 3D CNNs make it challenging to accelerate 3D CNNs on a single FPGA. In this work, we focus on accelerating the 3D CNN-based lung nodule segmentation on a multi-FPGA platform by proposing an efficient mapping scheme that takes advantage of the massive parallelism provided by the platform, as well as maximizing the computational efficiency of the accelerators. Experimental results show that our system integrating with four Xilinx VCU118 can achieve state-of-the-art performance of 14.5 TOPS, in addition with a 29. 4x performance gain over CPU and 10. 5x more energy efficiency over GPU.CCS Concepts • Computer systems organization $\rightarrow$Special purpose systems.

[1]  Jinbo Bi,et al.  Lung Nodule Detection , 2010, ImageCLEF.

[2]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[3]  Zelong Wang,et al.  Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA , 2018, FPGA.

[4]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[5]  Gilbert Held,et al.  The TCP/IP Protocol Suite , 2001 .

[6]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[7]  Vivek Vaidya,et al.  Lung nodule detection in CT using 3D convolutional neural networks , 2017, 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017).

[8]  Behrouz A. Forouzan,et al.  TCP / IP Protocol Suite Edisi 2 , 2012 .

[9]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[10]  Eric S. Chung,et al.  A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[11]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[12]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Hao Chen,et al.  Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: The LUNA16 challenge , 2016, Medical Image Anal..

[14]  Shmuel Winograd,et al.  On Multiplication of Polynomials Modulo a Polynomial , 1980, SIAM J. Comput..

[15]  Temesguen Messay,et al.  A new computationally efficient CAD system for pulmonary nodule detection in CT imagery , 2010, Medical Image Anal..

[16]  Jason Cong,et al.  Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster , 2016, ISLPED.

[17]  Abdul Kadir Leaf Identification Using Fourier Descriptors and Other Shape Features , 2015, CVPR 2015.

[18]  Jan Cornelis,et al.  A novel computer-aided lung nodule detection system for CT images. , 2011, Medical physics.

[19]  Andrew C. Ling,et al.  An OpenCL(TM) Deep Learning Accelerator on Arria 10 , 2017 .

[20]  Andrew C. Ling,et al.  An OpenCL™ Deep Learning Accelerator on Arria 10 , 2017, FPGA.

[21]  Soheil Ghiasi,et al.  Design space exploration of FPGA-based Deep Convolutional Neural Networks , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).