Segmentation of the glottal space from laryngeal images using the watershed transform

The present work describes a new method for the automatic detection of the glottal space from laryngeal images obtained either with high speed or with conventional video cameras attached to a laryngoscope. The detection is based on the combination of several relevant techniques in the field of digital image processing. The image is segmented with a watershed transform followed by a region merging, while the final decision is taken using a simple linear predictor. This scheme has successfully segmented the glottal space in all the test images used. The method presented can be considered a generalist approach for the segmentation of the glottal space because, in contrast with other methods found in literature, this approach does not need either initialization or finding strict environmental conditions extracted from the images to be processed. Therefore, the main advantage is that the user does not have to outline the region of interest with a mouse click. In any case, some a priori knowledge about the glottal space is needed, but this a priori knowledge can be considered weak compared to the environmental conditions fixed in former works.

[1]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Jose Luis Patino,et al.  Fuzzy relations applied to minimize over segmentation in watershed algorithms , 2005, Pattern Recognit. Lett..

[3]  Rafael C. González,et al.  Digital image processing using MATLAB , 2006 .

[4]  Qilian Yu,et al.  An Automatic Method to Quantify the Vibration Properties of Human Vocal Folds via Videokymography , 2003, Folia Phoniatrica et Logopaedica.

[5]  Aggelos K. Katsaggelos,et al.  Hybrid image segmentation using watersheds and fast region merging , 1998, IEEE Trans. Image Process..

[6]  David G. Stork,et al.  Pattern Classification , 1973 .

[7]  Ronald J. Baken,et al.  Clinical measurement of speech and voice , 1987 .

[8]  Thomas Wittenberg,et al.  Recording, processing, and analysis of digital high-speed sequences in glottography , 2005, Machine Vision and Applications.

[9]  Eom Joon Kim,et al.  Videostrobokymography: A New Method for the Quantitative Analysis of Vocal Fold Vibration , 1999, The Laryngoscope.

[10]  Diane Bless,et al.  New active contour algorithm for tracking vibrating vocal folds , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[11]  Susu Yao,et al.  Just noticeable distortion model and its applications in video coding , 2005, Signal Process. Image Commun..

[12]  O Musse,et al.  Three-dimensional segmentation of anatomical structures in MR images on large data bases. , 2001, Magnetic resonance imaging.

[13]  Jacques A. de Guise,et al.  A new set of fast algorithms for mathematical morphology : I. Idempotent geodesic transforms , 1992, CVGIP Image Underst..

[14]  Pedro Gómez Vilda,et al.  An improved watershed algorithm based on efficient computation of shortest paths , 2007, Pattern Recognit..

[15]  Xin Chen,et al.  Automatic tracing of vocal-fold motion from high-speed digital images , 2006 .

[16]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[17]  J.I. Godino-Llorente,et al.  Kymogram synthesis from pre-recorded low speed video data , 2002, Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society] [Engineering in Medicine and Biology.

[18]  Kenneth E. Barner,et al.  Joint region merging criteria for watershed-based image segmentation , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[19]  Jayaram K. Udupa,et al.  Fuzzy connectedness and image segmentation , 2003, Proc. IEEE.

[20]  H. K. Schutte,et al.  Videokymography: high-speed line scanning of vocal fold vibration. , 1996, Journal of voice : official journal of the Voice Foundation.

[21]  C.E. Shannon,et al.  Communication in the Presence of Noise , 1949, Proceedings of the IRE.

[22]  Manfred Moser,et al.  Recording, processing, and analysis of digital high-speed sequence in glottography , 1995 .

[23]  Yuling Yan,et al.  Biomedical Image Analysis in High-speed Laryngeal Imaging of Voice Production , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[24]  L. Joshua Leon,et al.  Watershed-Based Segmentation and Region Merging , 2000, Comput. Vis. Image Underst..

[25]  Jayaram K. Udupa,et al.  Iterative relative fuzzy connectedness for multiple objects with multiple seeds , 2007, Comput. Vis. Image Underst..

[26]  Day-Fann Shen,et al.  A watershed-based image segmentation using JND property , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[27]  T. Wittenberg,et al.  Direct evaluation of high-speed recordings of vocal fold vibrations. , 1996, Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics.

[28]  Leonardo Bocchi,et al.  Objective vocal fold vibration assessment from videokymographic images , 2006, MAVEBA.