Automatic Glottis Localization and Segmentation in Stroboscopic Videos Using Deep Neural Network

Exact analysis of the glottal vibration patten is vital for assessing voice pathologies. One of the primary steps in this analysis is automatic glottis segmentation, which, in turn, has two main parts, namely, glottis localization and the glottis segmentation. In this paper, we propose a deep neural network (DNN) based automatic glottis localization and segmentation scheme. We pose the problem as a classification problem where colors of each pixel and its neighborhood is classified as belonging to inside or outside the glottis region. We further process the classification result to get the biggest cluster, which is declared as the segmented glottis. The proposed algorithm is evaluated on a dataset comprising of stroboscopic videos from 18 subjects where the glottis region is marked by the three Speech Language Pathologists (SLPs). On average, the proposed DNN based segmentation scheme achieves a localization performance of 65.33% and segmentation DICE score of 0.74 (absolute), which is better than the baseline scheme by 22.66% and 0.09 respectively. We also find that the DICE score obtained by the DNN based segmentation scheme correlates well with the average DICE score computed between annotation provided by any two SLPs suggesting the robustness of the proposed glottis segmentation scheme.

[1]  I. Titze The myoelastic aerodynamic theory of phonation , 2006 .

[2]  Diane Bless,et al.  New active contour algorithm for tracking vibrating vocal folds , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[3]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[4]  Christophe d'Alessandro,et al.  Automatic glottal segmentation using local-based active contours and application to glottovibrography , 2012, Speech Commun..

[5]  Oscar Camara,et al.  Generalized Overlap Measures for Evaluation and Validation in Medical Image Analysis , 2006, IEEE Transactions on Medical Imaging.

[6]  Uwe Konerding,et al.  The interrater reliability of stroboscopy evaluations. , 2012, Journal of voice : official journal of the Voice Foundation.

[7]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[8]  Henry Völzke,et al.  Fully Automated Glottis Segmentation in Endoscopic Videos Using Local Color and Shape Features of Glottal Regions , 2015, IEEE Transactions on Biomedical Engineering.

[9]  Rubén Fraile Muñoz,et al.  Segmentation of the glottal space from laryngeal images using the watershed transform , 2008 .

[10]  Juan Ignacio Godino-Llorente,et al.  A New Approach for the Glottis Segmentation using Snakes , 2013, BIOSIGNALS.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Chi Zhu,et al.  Automatic tracing of vocal-fold motion from high-speed digital images , 2006, IEEE Transactions on Biomedical Engineering.