Relating Articulatory Motions in Different Speaking Rates

Movements of articulators (e.g., tongue, lips and jaw) in different speaking rates are related in a complex manner. In this work, we examine the underlying function to transform articulatory movements involved in producing speech at a neutral speaking rate into those at fast and slow speaking rates (N2F and N2S). For this we use articulatory movement data collected from five subjects using an Electromagnetic articulograph at neutral, fast and slow speaking rates. As candidate transformation functions (TF), we use affine transformations with a diagonal matrix and a full matrix and a nonlinear function modeled by a deep neural network (DNN). Since the duration of an utterance in different speaking rates would typically be unequal, it is required to time align the articulatory movement trajectories, which, in turn, affects the TF learnt. Therefore, we propose an iterative algorithm to alternately optimize for the TF and the time alignments. Subject specific experiments reveal that while N2F transformation can be well described by an affine transformation with a full matrix, N2S transformation is better represented by a more complex nonlinear function modeled by a DNN. This could be because subjects exhibit gross articulatory movements during fast speech and hyper-articulate while producing slow speech.

[1]  B. Lindblom Spectrographic Study of Vowel Reduction , 1963 .

[2]  J. Berry,et al.  Speaking Rate Effects on Normal Aspects of Articulation: Outcomes and Issues , 2011 .

[3]  Prasanta Kumar Ghosh,et al.  Reconstruction of articulatory movements during neutral speech from those during whispered speech. , 2018, The Journal of the Acoustical Society of America.

[4]  Hisao Kuwabara Acoustic and perceptual properties of phonemes in continuous speech as a function of speaking rate , 1997, EUROSPEECH.

[5]  Hugo Quené,et al.  Multilevel modeling of between-speaker and within-speaker variation in spontaneous speech tempo. , 2008, The Journal of the Acoustical Society of America.

[6]  R. Fox,et al.  Articulation rate across dialect, age, and gender , 2009, Language Variation and Change.

[7]  Yohan Payan,et al.  Synthesis of V-V sequences with a 2D biomechanical tongue model controlled by the Equilibrium Point Hypothesis , 1997, Speech Commun..

[8]  T. Gay Effect of speaking rate on vowel formant movements. , 1978, The Journal of the Acoustical Society of America.

[9]  B. Murdoch,et al.  Effects of speaking rate on EMA‐derived lingual kinematics: a preliminary investigation , 2003, Clinical linguistics & phonetics.

[10]  R. Fox,et al.  Between-speaker and within-speaker variation in speech tempo of American English. , 2010, The Journal of the Acoustical Society of America.

[11]  M. McClean Patterns of orofacial movement velocity across variations in speech rate. , 2000, Journal of speech, language, and hearing research : JSLHR.

[12]  R. H. Stetson Motor phonetics : a study of speech movements in action , 1951 .

[13]  Ann R Bradlow,et al.  Temporal organization of English clear and conversational speech. , 2008, The Journal of the Acoustical Society of America.

[14]  Yu Qiao,et al.  Affine Invariant Dynamic Time Warping and its Application to Online Rotated Handwriting Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[15]  Shrikanth Narayanan,et al.  A generalized smoothness criterion for acoustic-to-articulatory inversion. , 2010, The Journal of the Acoustical Society of America.

[16]  Eric Fosler-Lussier,et al.  Effects of speaking rate and word frequency on pronunciations in convertional speech , 1999, Speech Commun..

[17]  Prasanta Kumar Ghosh,et al.  Optimal sensor placement in electromagnetic articulography recording for speech production study , 2018, Comput. Speech Lang..

[18]  Richard M. Stern,et al.  Signal Processing for Robust Speech Recognition , 1994, HLT.

[19]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[20]  J E Flege,et al.  Effects of speaking rate on tongue position and velocity of movement in vowel production. , 1988, The Journal of the Acoustical Society of America.

[21]  Björn Lindblom,et al.  The Effect of Speaking Rate onConsonant Vowel Coarticulation , 2009, Phonetica.

[22]  P. Schönle,et al.  Electromagnetic articulography: Use of alternating magnetic fields for tracking movements of multiple points inside and outside the vocal tract , 1987, Brain and Language.

[23]  B. Lindblom,et al.  Interaction between duration, context, and speaking style in English stressed vowels , 1994 .

[24]  Meinard Müller,et al.  Dynamic Time Warping , 2008 .

[26]  Alfred Mertins,et al.  Automatic speech recognition and speech variability: A review , 2007, Speech Commun..

[27]  B. Kollmeier,et al.  Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes. , 2011, The Journal of the Acoustical Society of America.

[28]  J. L. Miller,et al.  Articulation Rate and Its Variability in Spontaneous Speech: A Reanalysis and Some Implications , 1984, Phonetica.