CNN Architecture for Surgical Image Segmentation with Recursive Structure and Flip-Based Upsampling

Laparoscopic surgery, a less invasive camera-aided surgery, is now performed commonly. However, it requires a camera assistant who holds and maneuvers a laparoscope. By controlling the laparoscope automatically using a robot, a surgeon can perform the operation without a camera assistant, which would be beneficial in areas suffering from lack of surgeons. In this paper, a prototype image segmentation architecture based on a convolutional neural network (CNN) is proposed  to realize an automated laparoscope control for cholecystectomy. Since a training dataset is annotated manually by a few surgeons, its scale is limited compared to common CNN-based systems. Therefore, we built a recursive network structure, with some sub-networks which are used multiple times, to mitigate overfitting. In addition, instead of the common transposed convolution, the flip-based subpixel reconstruction is introduced into upsampling layers. Furthermore, we applied stochastic depth regularization to the recursive structure for better accuracy. Evaluation results revealed that these improvements bring better  classification accuracy without increasing the number of parameters. The system shows a throughput sufficient for real-time laparoscope robot control with a single NVIDIA GeForce GTX 1080 GPU.

[1]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[2]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[3]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[4]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[5]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[6]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[11]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Sergey Ioffe,et al.  Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ken Turkowski,et al.  Filters for common resampling tasks , 1990 .

[16]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[18]  Yuichiro Shibata,et al.  FPGA Implementation of a Real-Time Super-Resolution System Using Flips and an RNS-Based CNN , 2018, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[19]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Xiaogang Wang,et al.  Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Yuichiro Shibata,et al.  CNN Architecture for Surgical Image Segmentation Systems with Recursive Network Structure to Mitigate Overfitting , 2019, 2019 Seventh International Symposium on Computing and Networking (CANDAR).

[22]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.