Normalisation articulatoire du locuteur par méthodes de décomposition tri-linéaire basées sur des données IRM (Articulatory speaker normalisation based on MRI-data using three-way linear decomposition methods) [in French]

Articulatory speaker normalisation based on MRI-data using three-way linear decomposition methods The aim of this study was to characterise, to model and to compare the different lingual articulatory strategies of a group of speakers. Individual principal component analysis (PCA) models and multi-linear decomposition methods have been applied to the tongue contours extracted from a magnetic resonance imaging (MRI) corpus of seven speakers articulating 63 French vowels and consonants. On the average over the seven speakers, using 4 components, the Root Mean Square prediction Error (RMSE) was 0.13 cm for the individual PCA models while the RMSE for the parallel factor model (PARAFAC) was 0.29 cm, accounting for a percentage of variance explanation of 91% and 62%, respectively. A multi-linear regression (MRL) model could predict, with 10 components, the tongue contour of a target subject from a given source subject, with about 65% of the variance explained and an RMSE of 0.38 cm. All the models have been assessed by a leave-one-out cross-validation procedure.

[1]  P. Ladefoged,et al.  Factor analysis of tongue shapes. , 1971, Journal of the Acoustical Society of America.

[2]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[3]  Phil Hoole,et al.  Modelling tongue configuration in German vowel production , 1998, ICSLP.

[4]  Christine Mooshammer,et al.  MODELING THE GERMAN STRESS DISTINCTION , 2000 .

[5]  Mark Hasegawa-Johnson,et al.  Analysis of the three-dimensional tongue shape using a three-index factor analysis model. , 2003, The Journal of the Acoustical Society of America.

[6]  Rasmus Bro,et al.  Jack-knife technique for outlier detection and estimation of standard errors in PARAFAC models , 2003 .

[7]  Pierre Badin,et al.  Three-dimensional modeling of speech organs: Articulatory data and models , 2006 .

[8]  G. Bailly,et al.  Linear degrees of freedom in speech production: analysis of cineradio- and labio-film data and articulatory-acoustic modeling. , 2001, The Journal of the Acoustical Society of America.

[9]  Gérard Bailly,et al.  Toward a Multi-Speaker Visual Articulatory Feedback System , 2011, INTERSPEECH.

[10]  Christian Kroos,et al.  Analysis of tongue configuration in multi-speaker, multi-volume MRI data , 2000 .

[11]  Fang Hu,et al.  On the Lingual Articulation in Vowel Production: Case Study from , 2006 .

[12]  Richard A. Harshman,et al.  Factor analysis of tongue shapes. , 1971, The Journal of the Acoustical Society of America.

[13]  Pierre Badin,et al.  Predicting unseen articulations from multi-speaker articulatory models , 2010, INTERSPEECH.

[14]  P. Hoole,et al.  On the lingual organization of the German vowel system. , 1999, The Journal of the Acoustical Society of America.