Examining visible articulatory features in clear and conversational speech

This study investigated the relationship between clear and conversational speech styles and motion of visible articulators. Using state-of-the-art computervision and image processing techniques, we examined front and side view videos of 18 native English speakers’ faces while they recited six English words containing various vowels (keyed, kid, cod, cud, cooed, could) and extracted measurements corresponding to the lip and jaw movements. Significant effects were found for style, gender, and saliency of visual speech cues. Clear speech exhibited longer vowel duration and more vertical lip stretching and jaw movement for all vowels, more horizontal lip stretching for front vowels, and a greater degree of lip protrusion for rounded vowels. Additionally, greater articulatory movements were found for male than female speakers in clear speech. These articulatory movement data demonstrate that speakers modify their speech productions in response to communicative needs in different speech contexts.

[1]  Hartmut Traunmüller,et al.  Audiovisual perception of openness and lip rounding in front vowels , 2007, J. Phonetics.

[2]  Jean-Pierre Gagné,et al.  Auditory, visual and audiovisual clear speech , 2002, Speech Commun..

[3]  Andrew Faulkner,et al.  The use of visual cues in the perception of non-native consonant contrasts. , 2006, The Journal of the Acoustical Society of America.

[4]  Jeesun Kim,et al.  Comparing the consistency and distinctiveness of speech produced in quiet and in noise , 2014, Comput. Speech Lang..

[5]  Zinny S. Bond,et al.  A note on the acoustic-phonetic characteristics of inadvertently clear speech , 1994, Speech Commun..

[6]  A. Jongman,et al.  Acoustic characteristics of clearly spoken English fricatives. , 2009, The Journal of the Acoustical Society of America.

[7]  Roland Göcke,et al.  A comparative study of 2d and 3d lip tracking methods for AV ASR , 2008, AVSP.

[8]  Stavros J. Perantonis,et al.  User-driven recognition of audio events in news videos , 2010, 2010 Fifth International Workshop Semantic Media Adaptation and Personalization.

[9]  J. Perkell,et al.  Economy of effort in different speaking conditions. I. A preliminary study of intersubject differences and modeling issues. , 2002, The Journal of the Acoustical Society of America.

[10]  M. Picheny,et al.  Speaking clearly for the hard of hearing. II: Acoustic characteristics of clear and conversational speech. , 1986, Journal of speech and hearing research.

[11]  Valerie Hazan,et al.  Acoustic-phonetic correlates of talker intelligibility for adults and children. , 2004, The Journal of the Acoustical Society of America.

[12]  João Manuel R S Tavares,et al.  Medical image registration: a review , 2014, Computer methods in biomechanics and biomedical engineering.

[13]  Ann R. Bradlow,et al.  Speaking and Hearing Clearly: Talker and Listener Factors in Speaking Style Changes , 2009, Lang. Linguistics Compass.

[14]  Jeffery A. Jones,et al.  Visual Prosody and Speech Intelligibility , 2004, Psychological science.

[15]  D. Kewley-Port,et al.  Vowel intelligibility in clear and conversational speech for normal-hearing and hearing-impaired listeners. , 2002, The Journal of the Acoustical Society of America.

[16]  Kristin L. Greilick,et al.  Acoustic and articulatory features of diphthong production: a speech clarity study. , 2010, Journal of speech, language, and hearing research : JSLHR.

[17]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[18]  Diane Kewley-Port,et al.  Talker differences in clear and conversational speech: acoustic characteristics of vowels. , 2007, Journal of speech, language, and hearing research : JSLHR.

[19]  Martin Cooke,et al.  Speech production modifications produced by competing talkers, babble, and stationary noise. , 2008, The Journal of the Acoustical Society of America.

[20]  David B. Pisoni,et al.  Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics , 1996, Speech Commun..

[21]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  K S Helfer,et al.  Auditory and auditory-visual perception of clear and conversational speech. , 1997, Journal of speech, language, and hearing research : JSLHR.

[23]  Jeesun Kim,et al.  Hearing Speech in Noise: Seeing a Loud Talker is Better , 2011, Perception.

[24]  Takaaki Kuratate,et al.  Linking facial animation, head motion and speech acoustics , 2002, J. Phonetics.

[25]  D. Massaro From Multisensory Integration to Talking Heads and Language Learning , 2002 .