Quantification of Transducer Misalignment in Ultrasound Tongue Imaging

In speech production research, different imaging modalities have been employed to obtain accurate information about the movement and shaping of the vocal tract. Ultrasound is an affordable and non-invasive imaging modality with relatively high temporal and spatial resolution to study the dynamic behavior of tongue during speech production. However, a long-standing problem for ultrasound tongue imaging is the transducer misalignment during longer data recording sessions. In this paper, we propose a simple, yet effective, misalignment quantification approach. The analysis employs MSE distance and two similarity measurement metrics to identify the relative displacement between the chin and the transducer. We visualize these measures as a function of the timestamp of the utterances. Extensive experiments are conducted on a Hungarian and Scottish English child dataset. The results suggest that large values of Mean Square Error (MSE) and small values of Structural Similarity Index (SSIM) and Complex Wavelet SSIM indicate corruptions or issues during the data recordings, which can either be caused by transducer misalignment or lack of gel.

[1]  Zhou Wang,et al.  Complex Wavelet Structural Similarity: A New Image Similarity Index , 2009, IEEE Transactions on Image Processing.

[2]  Lucie Ménard,et al.  Measuring Tongue Shapes and Positions with Ultrasound Imaging: A Validation Experiment Using an Articulatory Model , 2011, Folia Phoniatrica et Logopaedica.

[3]  George H. Weiss,et al.  Analysis of real-time ultrasound images of tongue configuration using a grid-digitizing system , 1983 .

[4]  Tamás Gábor Csapó,et al.  Error analysis of extracted tongue contours from 2d ultrasound images , 2015, INTERSPEECH.

[5]  J. Scobbie,et al.  Back to front: a socially-stratified ultrasound tongue imaging study of Scottish English /u/ , 2012 .

[6]  Jonathan C Irish,et al.  Increased midsagittal tongue velocity as indication of articulatory compensation in patients with lateral partial glossectomies , 2008, Head & neck.

[7]  Lucie Ménard,et al.  Interactive segmentation of tongue contours in ultrasound video sequences using quality maps , 2014, Medical Imaging.

[8]  Michael Pucher,et al.  UltraFit: A Speaker-friendly Headset for Ultrasound Recordings in Speech Science , 2018, INTERSPEECH.

[9]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[10]  N. Hewlett,et al.  Coarticulation as an indicator of speech motor control development in children: an ultrasound study. , 2011, Motor control.

[11]  Kele Xu,et al.  A comparative study on the contour tracking algorithms in ultrasound tongue images with automatic re-initialization. , 2016, The Journal of the Acoustical Society of America.

[12]  Lucie Ménard,et al.  Multi‐hypothesis tracking of the tongue surface in ultrasound video recordings of normal and impaired speech , 2018, Medical Image Anal..

[13]  James M Scobbie,et al.  A common co-ordinate system for mid-sagittal articulatory measurement , 2011 .

[14]  Gábor Gosztolya,et al.  DNN-Based Ultrasound-to-Speech Conversion for a Silent Speech Interface , 2017, INTERSPEECH.

[15]  Bart Bolsterlee,et al.  Effect of Transducer Orientation on Errors in Ultrasound Image-Based Measurements of Human Medial Gastrocnemius Muscle Fascicle Length and Pennation , 2016, PloS one.

[16]  Natalia Zharkova,et al.  A normative-speaker validation study of two indices developed to quantify tongue dorsum activity from midsagittal tongue shapes , 2013, Clinical linguistics & phonetics.

[17]  Jeff Mielke,et al.  Palatron: a technique for aligning ultrasound images of the tongue and palate , 2005 .

[18]  Gérard Chollet,et al.  Statistical Mapping Between Articulatory and Acoustic Data for an Ultrasound-Based Silent Speech Interface , 2011, INTERSPEECH.

[19]  M Stone,et al.  A head and transducer support system for making ultrasound images of tongue/jaw movement. , 1995, The Journal of the Acoustical Society of America.

[20]  Gábor Gosztolya,et al.  Ultrasound-based Silent Speech Interface Built on a Continuous Vocoder , 2019, INTERSPEECH.

[21]  Anish Kumar,et al.  An absolute method for determination of misalignment of an immersion ultrasonic transducer. , 2014, Ultrasonics.

[22]  M. Stone A guide to analysing tongue motion from ultrasound images , 2005, Clinical linguistics & phonetics.

[23]  Shrikanth S. Narayanan,et al.  Analysis of speech production real-time MRI , 2018, Comput. Speech Lang..

[24]  Alan Wrench,et al.  UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions , 2018, INTERSPEECH.