A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation

PurposeAutomated segmentation of anatomical structures in medical image analysis is a prerequisite for autonomous diagnosis as well as various computer- and robot-aided interventions. Recent methods based on deep convolutional neural networks (CNN) have outperformed former heuristic methods. However, those methods were primarily evaluated on rigid, real-world environments. In this study, existing segmentation methods were evaluated for their use on a new dataset of transoral endoscopic exploration.MethodsFour machine learning-based methods SegNet, UNet, ENet and ErfNet were trained with supervision on a novel 7-class dataset of the human larynx. The dataset contains 536 manually segmented images from two patients during laser incisions. The Intersection-over-Union (IoU) evaluation metric was used to measure the accuracy of each method. Data augmentation and network ensembling were employed to increase segmentation accuracy. Stochastic inference was used to show uncertainties of the individual models. Patient-to-patient transfer was investigated using patient-specific fine-tuning.ResultsIn this study, a weighted average ensemble network of UNet and ErfNet was best suited for the segmentation of laryngeal soft tissue with a mean IoU of 84.7%. The highest efficiency was achieved by ENet with a mean inference time of 9.22 ms per image. It is shown that 10 additional images from a new patient are sufficient for patient-specific fine-tuning.ConclusionCNN-based methods for semantic segmentation are applicable to endoscopic images of laryngeal soft tissue. The segmentation can be used for active constraints or to monitor morphological changes and autonomously detect pathologies. Further improvements could be achieved by using a larger dataset or training the models in a self-supervised manner on additional unlabeled data.

[1]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[2]  Anil A. Bharath,et al.  Denoising Adversarial Autoencoders: Classifying Skin Lesions Using Limited Labelled Training Data , 2018, IET Comput. Vis..

[3]  Eduardo Romera,et al.  ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[4]  Tobias Ortmaier,et al.  Color-encoded distance for interactive focus positioning in laser microsurgery , 2016 .

[5]  P. Schuler,et al.  Potential Advantages of a Single-Port, Operator-Controlled Flexible Endoscope System for Transoral Surgery of the Larynx , 2015, The Annals of otology, rhinology, and laryngology.

[6]  M S Woolfson,et al.  Application of region-based segmentation and neural network edge detection to skin lesions. , 2004, Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society.

[7]  Leonardo S. Mattos,et al.  Laryngeal Tumor Detection and Classification in Endoscopic Video , 2016, IEEE Journal of Biomedical and Health Informatics.

[8]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Meritxell Bach Cuadra,et al.  A review of atlas-based segmentation for magnetic resonance brain images , 2011, Comput. Methods Programs Biomed..

[10]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[14]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[15]  Sébastien Ourselin,et al.  Interactive Medical Image Segmentation Using Deep Learning With Image-Specific Fine Tuning , 2017, IEEE Transactions on Medical Imaging.

[16]  Rita R Patel,et al.  The Next 10 Years in Voice Evaluation and Treatment , 2016, Seminars in Speech and Language.

[17]  Sébastien Ourselin,et al.  Real-Time Segmentation of Non-rigid Surgical Tools Based on Deep Learning and Tracking , 2016, CARE@MICCAI.

[18]  George D. Stetten,et al.  Enhanced snake based segmentation of vocal folds , 2004, 2004 2nd IEEE International Symposium on Biomedical Imaging: Nano to Macro (IEEE Cat No. 04EX821).

[19]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[20]  Pierre Graebling,et al.  Real-time segmentation of surgical instruments inside the abdominal cavity using a joint hue saturation color feature , 2005, Real Time Imaging.

[21]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[22]  M. Schuster,et al.  A noninvasive procedure for early-stage discrimination of malignant and precancerous vocal fold lesions based on laryngeal dynamics analysis. , 2015, Cancer research.

[23]  Joachim Denzler,et al.  Automatic Classification of Cancerous Tissue in Laserendomicroscopy Images of the Oral Cavity using Deep Learning , 2017, Scientific Reports.

[24]  Nima Tajbakhsh,et al.  Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? , 2016, IEEE Transactions on Medical Imaging.

[25]  A. Skalski,et al.  Voice pathology classification based on High-Speed Videoendoscopy , 2015, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[26]  Juan Ignacio Godino-Llorente,et al.  Segmentation of the glottal space from laryngeal images using the watershed transform , 2008, Comput. Medical Imaging Graph..

[27]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[29]  Arnold W. M. Smeulders,et al.  Interaction in the segmentation of medical images: A survey , 2001, Medical Image Anal..

[30]  Abdesselam Bouzerdoum,et al.  Skin segmentation using color pixel classification: analysis and comparison , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  J. Alison Noble,et al.  Ultrasound image segmentation: a survey , 2006, IEEE Transactions on Medical Imaging.

[33]  Sherif Hashem,et al.  Optimal Linear Combinations of Neural Networks , 1997, Neural Networks.

[34]  Tobias Ortmaier,et al.  Stereo vision‐based tracking of soft tissue motion with application to online ablation control in laser microsurgery , 2017, Medical Image Anal..

[35]  Sankar K. Pal,et al.  A review on image segmentation techniques , 1993, Pattern Recognit..

[36]  M. Elif Karsligil,et al.  Classification of laryngeal disorders based on shape and vascular defects of vocal folds , 2015, Comput. Biol. Medicine.

[37]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[38]  A. F. Adams,et al.  The Survey , 2021, Dyslexia in Higher Education.

[39]  Tobias Ortmaier,et al.  Soft tissue motion tracking with application to tablet-based incision planning in laser surgery , 2016, International Journal of Computer Assisted Radiology and Surgery.

[40]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.