Using Statistical Deformable Models to Reconstruct Vocal Tract Shape from Magnetic Resonance Images

The mechanisms involved in speech production are complex and have thus been subject to growing attention by the scientific community. It has been demonstrated that magnetic resonance imaging (MRI) is a powerful means in the understanding of the morphology of the vocal tract. Over the last few years, statistical deformable models have been successfully used to identify and characterize bones and organs in medical images and point distribution models (PDMs) have gained particular relevance. In this work, the suitability of these models has been studied to characterize and further reconstruct the shape of the vocal tract in the articulation of Portuguese European (EP) speech sounds, one of the most spoken languages worldwide, with the aid of MR images. Therefore, a PDM has been built from a set of MR images acquired during the artificially sustained articulation of 25 EP speech sounds. Following this, the capacity of this statistical model to characterize the shape deformation of the vocal tract during the production of sounds was analysed. Next, the model was used to reconstruct five EP oral vowels and the EP fricative consonants. As far as a study on speech production is concerned, this study is considered to be the first approach to characterize and reconstruct the vocal tract shape from MR images by using PDMs. In addition, the findings achieved permit one to conclude that this modelling technique compels an enhanced understanding of the dynamic speech events involved in sustained articulations based on MRI, which are of particular interest for speech rehabilitation and simulation.

[1]  Jacques Jacot,et al.  Constraining deformable templates for shape recognition , 2003, International Conference on Quality Control by Artificial Vision.

[2]  Timothy F. Cootes,et al.  Active Shape Models: Evaluation of a Multi-Resolution Method for Improving Image Search , 1994, BMVC.

[3]  Taein Lee,et al.  Active contour models , 2005 .

[4]  P. C. Pandey,et al.  The Journal of the Acoustical Society of America , 1939 .

[5]  Shinji Maeda Improved articulatory models , 1988 .

[6]  Zhen Ma,et al.  A review of algorithms for medical image segmentation and their applications to the female pelvic cavity , 2010, Computer methods in biomechanics and biomedical engineering.

[7]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[8]  Robert I. Damper,et al.  Extracting Tongue Shape Dynamics from Magnetic Resonance Image Sequences , 2004 .

[9]  Takeo Kanade,et al.  Dual-state parametric eye tracking , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[10]  João Manuel R S Tavares,et al.  Application of MRI and biomedical engineering in speech production study , 2009, Computer methods in biomechanics and biomedical engineering.

[11]  Pierre Badin,et al.  Three-dimensional modeling of speech organs: Articulatory data and models , 2006 .

[12]  C. Taylor,et al.  Active shape models - 'Smart Snakes'. , 1992 .

[13]  Francisco P. M. Oliveira,et al.  Algorithm of Dynamic Programming for Optimization of the Global Matching between Two Contours Defined by Ordered Points , 2008 .

[14]  P. Rubin,et al.  CASY: The Haskins Configurable Articulatory Synthesizer , 2003 .

[15]  Francisco P. M. Oliveira,et al.  Computer analysis of objects' movement in image sequences : methods and applications , 2009 .

[16]  João Manuel R S Tavares,et al.  Toward dynamic magnetic resonance imaging of the vocal tract during speech production. , 2011, Journal of voice : official journal of the Voice Foundation.

[17]  R. M. Natal Jorge,et al.  Segmentation and simulation of objects represented in images using physical principles , 2008 .

[18]  Andrew J. Lundberg,et al.  Using principal component analysis of tongue surface shapes to distinguish among vowels and speakers , 1997 .

[19]  Olov Engwall A REVISIT TO THE APPLICATION OF MRI TO THE ANALYSIS OF SPEECH PRODUCTION – TESTING OUR ASSUMPTIONS , 2003 .

[20]  K. Shirai,et al.  Estimation of articulatory motion by a model matching method , 1978 .

[21]  Timothy F. Cootes,et al.  An Automatic Face Identification System Using Flexible Appearance Models , 1994, BMVC.

[22]  Diamantino Freitas,et al.  Imaging of the Vocal Tract Based on Magnetic Resonance Techniques , 2009, VISIGRAPP.

[23]  Pierre Badin,et al.  Towards a 3D articulatory model of velum based on MRI and CT images , 2005 .

[24]  Timothy F. Cootes,et al.  Training Models of Shape from Sets of Examples , 1992, BMVC.

[25]  Fang-Chih Tien,et al.  Automated visual inspection for microdrills in printed circuit board production , 2004 .

[26]  Olov Engwall,et al.  Are static MRI measurements representative of dynamic speech? results from a comparative study using MRI, EPG and EMA , 2000, Interspeech.

[27]  Brad H Story,et al.  Comparison of magnetic resonance imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002. , 2008, The Journal of the Acoustical Society of America.

[28]  Timothy F. Cootes,et al.  Automatic face identification system using flexible appearance models , 1995, Image Vis. Comput..

[29]  Alan L. Yuille,et al.  Feature extraction from faces using deformable templates , 2004, International Journal of Computer Vision.

[30]  Julie Fontecave Jallon,et al.  A semi-automatic method for extracting vocal tract movements from X-ray films , 2009, Speech Commun..

[31]  Jo a o Manuel R. S. Tavares,et al.  Methods to automatically build Point Distribution Models for objects like hand palms and faces represented in images , 2008 .

[32]  P. Ladefoged,et al.  Factor analysis of tongue shapes. , 1971, Journal of the Acoustical Society of America.