Multi‐hypothesis tracking of the tongue surface in ultrasound video recordings of normal and impaired speech

HighlightsA new method based particle filtering robustly tracks the tongue in US images.The method does not require a large or very diverse training set.The method is accurate in both normal and impaired speech US data.It yields low distance error and high shape correlation compared to ground truth.Its accuracy is an improvement over Edgetrak, Autotrace and TongueTrack. Graphical abstract Figure. No Caption available. Abstract Characterizing tongue shape and motion, as they appear in real‐time ultrasound (US) images, is of interest to the study of healthy and impaired speech production. Quantitative anlaysis of tongue shape and motion requires that the tongue surface be extracted in each frame of US speech recordings. While the literature proposes several automated methods for this purpose, these either require large or very well matched training sets, or lack robustness in the presence of rapid tongue motion. This paper presents a new robust method for tongue tracking in US images that combines simple tongue shape and motion models derived from a small training data set with a highly flexible active contour (snake) representation and maintains multiple possible hypotheses as to the correct tongue contour via a particle filtering algorithm. The method was tested on a database of large free speech recordings from healthy and impaired speakers and its accuracy was measured against the manual segmentations obtained for every image in the database. The proposed method achieved mean sum of distances errors of 1.69 ± 1.10 mm, and its accuracy was not highly sensitive to training set composition. Furthermore, the proposed method showed improved accuracy, both in terms of mean sum of distances error and in terms of linguistically meaningful shape indices, compared to the three publicly available tongue tracking software packages Edgetrak, TongueTrack and Autotrace.

[1]  Lucie Ménard,et al.  Effects of blindness on production–perception relationships: Compensation strategies for a lip-tube perturbation of the French [u] , 2016, Clinical linguistics & phonetics.

[2]  Maureen Stone,et al.  Principal component analysis of cross sections of tongue shapes in vowel production , 1997, Speech Commun..

[3]  Lyudmila Mihaylova,et al.  Contour segmentation in 2D ultrasound medical images with particle filtering , 2011, Machine Vision and Applications.

[4]  Gustavo Carneiro,et al.  Multiple dynamic models for tracking the left ventricle of the heart from ultrasound data using particle filters and deep learning architectures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Lucie Ménard,et al.  Robust tongue tracking in ultrasound images: a multi-hypothesis approach , 2015, INTERSPEECH.

[6]  Ian R. Fasel,et al.  Training Deep Nets with Imbalanced and Unlabeled Data , 2012, INTERSPEECH.

[7]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  M. Stone,et al.  Three-dimensional tongue surface shapes of English consonants and vowels. , 1996, The Journal of the Acoustical Society of America.

[9]  Petros Maragos,et al.  Tongue tracking in Ultrasound images with Active Appearance Models , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[10]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[11]  Pierre Roussel-Ragot,et al.  Tongue contour extraction from ultrasound images based on deep neural network , 2015, ICPhS.

[12]  M. Stone A guide to analysing tongue motion from ultrasound images , 2005, Clinical linguistics & phonetics.

[13]  Maureen Stone,et al.  Robust contour tracking in ultrasound tongue image sequences , 2016, Clinical linguistics & phonetics.

[14]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[15]  Kele Xu,et al.  A comparative study on the contour tracking algorithms in ultrasound tongue images with automatic re-initialization. , 2016, The Journal of the Acoustical Society of America.

[16]  C. Kambhamettu,et al.  Automatic contour tracking in ultrasound images , 2005, Clinical linguistics & phonetics.

[17]  Namrata Vaswani,et al.  Tracking Deforming Objects Using Particle Filtering for Geometric Active Contours , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Lisa Tang,et al.  Tongue Contour Tracking in Dynamic Ultrasound via Higher-order Mrfs and Ecient Fusion Moves , 2012 .

[19]  Marie-Odile Berger,et al.  Using a Biomechanical Model for Tongue Tracking in Ultrasound Images , 2014, ISBMS.

[20]  Jonathan C Irish,et al.  Analysing normal and partial glossectomee tongues using ultrasound , 2005, Clinical linguistics & phonetics.

[21]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[22]  Thomas Hueber,et al.  Tongue tracking in ultrasound images using eigentongue decomposition and artificial neural networks , 2015, INTERSPEECH.

[23]  G. Hamarneh,et al.  Combining snakes and active shape models for segmenting the human left ventricle in echocardiographic images , 2000, Computers in Cardiology 2000. Vol.27 (Cat. 00CH37163).

[24]  Riccardo Muradore,et al.  Robust Real-Time Needle Tracking in 2-D Ultrasound Images Using Statistical Filtering , 2017, IEEE Transactions on Control Systems Technology.

[25]  Paolo Fiorini,et al.  A Robust Particle Filtering Approach with Spatially-dependent Template Selection for Medical Ultrasound Tracking Applications , 2016, VISIGRAPP.

[26]  Tamás Gábor Csapó,et al.  Error analysis of extracted tongue contours from 2d ultrasound images , 2015, INTERSPEECH.

[27]  Ian R. Fasel,et al.  Deep Belief Networks for Real-Time Extraction of Tongue Contours from Ultrasound During Speech , 2010, 2010 20th International Conference on Pattern Recognition.

[28]  Chandra Kambhamettu,et al.  Automatic extraction and tracking of the tongue contours , 1999, IEEE Transactions on Medical Imaging.

[29]  Leonardo Lancia,et al.  A survey of methods for the analysis of the temporal evolution of speech articulator trajectories , 2012 .

[30]  Lucie Ménard,et al.  Measuring Tongue Shapes and Positions with Ultrasound Imaging: A Validation Experiment Using an Articulatory Model , 2011, Folia Phoniatrica et Logopaedica.

[31]  Lucie Ménard,et al.  Interactive segmentation of tongue contours in ultrasound video sequences using quality maps , 2014, Medical Imaging.

[32]  Wolfram Burgard,et al.  Monte Carlo Localization: Efficient Position Estimation for Mobile Robots , 1999, AAAI/IAAI.

[33]  A. Lohmander,et al.  Exploring quantitative methods for evaluation of lip function. , 2011, Journal of oral rehabilitation.

[34]  G E Carlsson,et al.  Bite force and handgrip force in patients with molecular diagnosis of myotonic dystrophy. , 2007, Journal of oral rehabilitation.

[35]  Ramesh C. Jain,et al.  Using Dynamic Programming for Solving Variational Problems in Vision , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  N. Hewlett,et al.  Coarticulation as an indicator of speech motor control development in children: an ultrasound study. , 2011, Motor control.

[37]  Paul J. Smith,et al.  Principal Components Representation of the Two-Dimensional Coronal Tongue Surface , 2002, Phonetica.

[38]  Lucie Ménard,et al.  Exploring consequences of short- and long-term deafness on speech production: A lip-tube perturbation study , 2015, Clinical linguistics & phonetics.