Multi-modal recording and modeling of vocal tract movements

The complexity of vocal tract movement causes the difficult to record whole information of vocal tract during speech. Dynamic articulation has been acquired by implementing a variety of instruments, each of which has its advantages and shortcomings. However, the measurement of vocal tract movements is a difficult task to accomplish using one type of recording technique, and this has led to the simultaneous application of multiple instruments. Thus, we used an ultrasound system in combination with the electromagnetic articulography (EMA) system to record the multi-modality movement of the tongue. Data of the vocal tract movements were obtained by the ultrasound-based speech recording system developed by us, with which ultrasound images and synchronized audio signals are recorded synchronously. The EMA system is also used for the simultaneous collection of articulatory data with the audio. The EMA and ultrasound data were registered and matched to the same audio signal, after which these two sets of data were fused for each time point. In addition, a method for vocal tract shape reconstruction and modeling is proposed for the ultrasound dataset by using an active shape model. The averaged reconstruction error does not exceed 1.26 mm.

[1]  Jeff Mielke,et al.  Palatron: a technique for aligning ultrasound images of the tongue and palate , 2005 .

[2]  Lise Crevier-Buchman,et al.  Silent vs vocalized articulation for a portable ultrasound-based silent speech interface , 2010, INTERSPEECH.

[3]  Prabhat Verma,et al.  A framework to integrate speech based interface for blind web users on the websites of public interest , 2013, Human-centric Computing and Information Sciences.

[4]  Chai Wang Yin,et al.  Aerial Images Rectification Using Non-parametric Approach , 2013 .

[5]  M H Cohen,et al.  Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. , 1992, The Journal of the Acoustical Society of America.

[6]  M Stone,et al.  A head and transducer support system for making ultrasound images of tongue/jaw movement. , 1995, The Journal of the Acoustical Society of America.

[7]  J. Dang,et al.  Generalized Finite Difference Time Domain Method and Its Application to Acoustics , 2015 .

[8]  Jianwu Dang,et al.  Tongue shape synthesis based on Active Shape Model , 2012, 2012 8th International Symposium on Chinese Spoken Language Processing.

[9]  Jianwu Dang,et al.  Reconstruction of vocal tract based on multi-source image information , 2012, 2012 8th International Symposium on Chinese Spoken Language Processing.

[10]  CholletGérard,et al.  Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips , 2010 .

[11]  George H. Weiss,et al.  Analysis of real-time ultrasound images of tongue configuration using a grid-digitizing system , 1983 .

[12]  Robert I. Damper,et al.  Extracting Tongue Shape Dynamics from Magnetic Resonance Image Sequences , 2004 .

[13]  C. Kambhamettu,et al.  Automatic contour tracking in ultrasound images , 2005, Clinical linguistics & phonetics.

[14]  Philip Hoole,et al.  Electromagnetic articulography in coarticulation research , 1997 .

[15]  Ramakant Nevatia,et al.  Janus - Multi Source Event Detection and Collection System for Effective Surveillance of Criminal Activity , 2014, J. Inf. Process. Syst..

[16]  Alejandro F. Frangi,et al.  Active shape model segmentation with optimal features , 2002, IEEE Transactions on Medical Imaging.

[17]  Skjalg Lepsøy,et al.  Conversion of articulatory parameters into active shape model coefficients for lip motion representation and synthesis , 1998, Signal Process. Image Commun..

[18]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..