Real-time Automatic Tongue Contour Tracking in Ultrasound Video for Guided Pronunciation Training

Ultrasound technology is safe, relatively affordable, and capable of real-time performance. Recently, it has been employed to visualize tongue function for second language education, where visual feedback of tongue motion complements conventional audio feedback. It requires expertise for non-expert users to recognize tongue shape in noisy and low-contrast ultrasound images. To alleviate this problem, tongue dorsum can be tracked and visualized automatically. However, the rapidity and complexity of tongue gestures as well as ultrasound low-quality images have made it a challenging task for real-time applications. The progress of deep convolutional neural networks has been successfully exploited in various computer vision applications such that it provides a promising alternative for real-time automatic tongue contour tracking in ultrasound video. In this paper, a guided language training system is proposed which benefits from our automatic segmentation approach to highlight tongue contour region on ultrasound images and superimposing them on face profile of a language learner for better tongue localization. Assessments of the system revealed its flexibility and efficiency for training pronunciation of difficult words via tongue function visualization. Moreover, our tongue tracking technique demonstrates that it exceeds other methods in terms of performance and accuracy.

[1]  Guorui Sheng,et al.  Direct, Near Real Time Animation of a 3D Tongue Model Using Non-Invasive Ultrasound Images , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  P. Wells,et al.  Ultrasound imaging , 2006, Physics in medicine and biology.

[3]  Tamás Gábor Csapó,et al.  Convolutional neural network-based automatic classification of midsagittal tongue gestural targets using B-mode ultrasound images. , 2017, The Journal of the Acoustical Society of America.

[4]  C. Kambhamettu,et al.  Automatic contour tracking in ultrasound images , 2005, Clinical linguistics & phonetics.

[5]  Lisa Tang,et al.  Graph-based tracking of the tongue contour in ultrasound sequences with adaptive temporal regularization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[6]  A. Wrench,et al.  How To Get Started With Ultrasound Technology for Treatment of Speech Sound Disorders , 2015 .

[7]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[8]  Bryan Gick,et al.  11. Ultrasound imaging applications in second language acquisition , 2008 .

[9]  Marie-Odile Berger,et al.  Using a Biomechanical Model for Tongue Tracking in Ultrasound Images , 2014, ISBMS.

[10]  Bosung Kim,et al.  Ultrasound-Enhanced Multimodal Approaches to Pronunciation Teaching and Learning , 2015 .

[11]  Bosko Radanov,et al.  Ultrasound as visual feedback in speech habilitation: Exploring consultative use in rural British Columbia, Canada , 2008, Clinical linguistics & phonetics.

[12]  Thomas Hueber,et al.  Tongue tracking in ultrasound images using eigentongue decomposition and artificial neural networks , 2015, INTERSPEECH.

[13]  YanYilin,et al.  A Survey on Deep Learning , 2018 .

[14]  Maureen Stone,et al.  Robust contour tracking in ultrasound tongue image sequences , 2016, Clinical linguistics & phonetics.

[15]  James M Scobbie,et al.  Seeing Speech: an articulatory web resource for the study of phonetics , 2015 .

[16]  R. D. Poshusta Error Analysis , 2019, Numerical Methods.

[17]  Thomas Hueber Ultraspeech-player: intuitive visualization of ultrasound articulatory data for speech therapy and pronunciation training , 2013, INTERSPEECH.

[18]  N. Zharkova Using Ultrasound to Quantify Tongue Shape and Movement Characteristics , 2013, The Cleft palate-craniofacial journal : official publication of the American Cleft Palate-Craniofacial Association.

[19]  Tamás Gábor Csapó,et al.  Error analysis of extracted tongue contours from 2d ultrasound images , 2015, INTERSPEECH.

[20]  D. Geddes,et al.  Ultrasound Imaging of Breastfeeding—A Window to the Inside , 2016, Journal of human lactation : official journal of International Lactation Consultant Association.

[21]  P. Mccabe,et al.  Ultrasound visual feedback treatment and practice variability for residual speech sound errors. , 2014, Journal of speech, language, and hearing research : JSLHR.

[22]  Ian R. Fasel,et al.  Deep Belief Networks for Real-Time Extraction of Tongue Contours from Ultrasound During Speech , 2010, 2010 20th International Conference on Pattern Recognition.

[23]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[24]  J. Preston,et al.  Ultrasound visual feedback for acquired apraxia of speech: A case report , 2014 .

[25]  Bryan Gick,et al.  Ultrasound Technology and SecondLanguage Acquisition Research , 2006 .

[26]  B. Gick,et al.  Ultrasound in speech therapy with adolescents and adults , 2005, Clinical linguistics & phonetics.

[27]  Bruce Denby,et al.  Updating the silent speech challenge benchmark with deep learning , 2017, Speech Commun..

[28]  J. M. Gilbert,et al.  Silent speech interfaces , 2010, Speech Commun..

[29]  Slim Ouni,et al.  Tongue control and its implication in pronunciation training , 2014 .

[30]  Lucie Ménard,et al.  Robust tongue tracking in ultrasound images: a multi-hypothesis approach , 2015, INTERSPEECH.

[31]  Petros Maragos,et al.  Tongue tracking in Ultrasound images with Active Appearance Models , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[32]  Pierre Roussel-Ragot,et al.  An Articulatory-Based Singing Voice Synthesis Using Tongue and Lips Imaging , 2016, INTERSPEECH.

[33]  Lucie Ménard,et al.  Interactive segmentation of tongue contours in ultrasound video sequences using quality maps , 2014, Medical Imaging.

[34]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.