Automatic Glottis Detection and Segmentation in Stroboscopic Videos Using Convolutional Networks

Laryngeal videostroboscopy is widely used for the analysis of glottal vibration patterns. This analysis plays a crucial role in the diagnosis of voice disorders. It is essential to study these patterns using automatic glottis segmentation methods to avoid subjectiveness in diagnosis. Glottis detection is an essential step before glottis segmentation. This paper considers the problem of automatic glottis segmentation using U-Net based deep convolutional networks. For accurate glottis detection, we train a fully convolutional network with a large amount of glottal and non-glottal images. In glottis segmentation, we consider UNet with three different weight initialization schemes: 1) Random weight Initialization (RI), 2) Detection Network weight Initialization (DNI) and 3) Detection Network encoder frozen weight Initialization (DNIFr), using two different architectures: 1) U-Net without skip connection (UWSC) 2) U-Net with skip connection (USC). Experiments with 22 subjects’ data reveal that the performance of glottis segmentation network can be increased by initializing its weights using those of the glottis detection network. Among all schemes, when DNI is used, the USC yields an average localization accuracy of 81.3% and a Dice score of 0.73, which are better than those from the baseline approach by 15.87% and 0.07 (absolute), respectively.

[1]  Juan Ignacio Godino-Llorente,et al.  Fully-automatic glottis segmentation with active shape models , 2011, MAVEBA.

[2]  Juan Ignacio Godino-Llorente,et al.  Segmentation of the glottal space from laryngeal images using the watershed transform , 2008, Comput. Medical Imaging Graph..

[3]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[5]  I. Titze The myoelastic aerodynamic theory of phonation , 2006 .

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  M. V. Achuth Rao,et al.  Automatic Glottis Localization and Segmentation in Stroboscopic Videos Using Deep Neural Network , 2018, INTERSPEECH.

[8]  Christian Igel,et al.  Deep Feature Learning for Knee Cartilage Segmentation Using a Triplanar Convolutional Neural Network , 2013, MICCAI.

[9]  Uwe Konerding,et al.  The interrater reliability of stroboscopy evaluations. , 2012, Journal of voice : official journal of the Voice Foundation.

[10]  Henry Völzke,et al.  Fully Automated Glottis Segmentation in Endoscopic Videos Using Local Color and Shape Features of Glottal Regions , 2015, IEEE Transactions on Biomedical Engineering.

[11]  Diane Bless,et al.  New active contour algorithm for tracking vibrating vocal folds , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[12]  Daniel S. Elson,et al.  Quantification and Analysis of Laryngeal Closure From Endoscopic Videos , 2019, IEEE Transactions on Biomedical Engineering.

[13]  Christophe d'Alessandro,et al.  Automatic glottal segmentation using local-based active contours and application to glottovibrography , 2012, Speech Commun..

[14]  Chi Zhu,et al.  Automatic tracing of vocal-fold motion from high-speed digital images , 2006, IEEE Transactions on Biomedical Engineering.

[15]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[16]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[17]  Oscar Camara,et al.  Generalized Overlap Measures for Evaluation and Validation in Medical Image Analysis , 2006, IEEE Transactions on Medical Imaging.

[18]  Ronald M. Summers,et al.  DeepOrgan: Multi-level Deep Convolutional Networks for Automated Pancreas Segmentation , 2015, MICCAI.