May speech modifications in noise contribute to enhance audio-visible cues to segment perception?

In this study we explore how acoustic and lip articulatory characteristics of bilabial consonants and three extreme French vowels vary in Lombard speech. In the light of several theories of segments perception we have shown that formant modifications should decrease the audio intelligibility of vowels in noise. On the contrary, modification in lip articulation should improve the visual intelligibility of vowels and bilabial consonants. This is not in agreement with previous studies which reported a global increased intelligibility of Lombard speech especially in the audio domain and not a lot in the visual one [1-3]. Thus, more detailed research is needed about the segmental and prosodic contribution to the increased intelligibility of Lombard speech Index Terms: Lombard speech, production, audiovisual cues.

[1]  C. Kroos,et al.  Are there compensatory effects in natural speech , 1999 .

[2]  H. S. Gopal,et al.  A perceptual model of vowel recognition based on the auditory representation of American English vowels. , 1986, The Journal of the Acoustical Society of America.

[3]  N I Durlach,et al.  Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech. , 1985, Journal of speech and hearing research.

[4]  Eric Vatikiotis-Bateson,et al.  Audiovisual processing of Lombard speech , 2005, AVSP.

[5]  H. Traunmüller Perceptual dimension of openness in vowels. , 1981, The Journal of the Acoustical Society of America.

[6]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[7]  Björn Lindblom,et al.  Speech transforms , 1992, Speech Commun..

[8]  A. Liberman,et al.  The motor theory of speech perception revised , 1985, Cognition.

[9]  L. Chistovich,et al.  The ‘center of gravity’ effect in vowel spectra and critical distance between the formants: Psychoacoustical study of the perception of vowel-like stimuli , 1979, Hearing Research.

[10]  H. Lane,et al.  The Lombard Sign and the Role of Hearing in Speech , 1971 .

[11]  J. Liénard,et al.  Effect of vocal effort on spectral properties of vowels. , 1999, The Journal of the Acoustical Society of America.

[12]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[13]  T.H. Crystal,et al.  Linear prediction of speech , 1977, Proceedings of the IEEE.

[14]  Jean-Claude Junqua,et al.  Acoustic and production pilot studies of speech vowels produced in noise , 1992, ICSLP.

[15]  A. Malécot Mechanical Pressure as an Index of ‘Force of Articulation’ , 1966 .

[16]  René Carré,et al.  Modeling and Perception of ‘Gesture Reduction’ , 2000, Phonetica.

[17]  B. Granström,et al.  Music and Hearing Quarterly Progress and Status Report Some studies concerning perception of isolated vowels , 2007 .

[18]  Pascal Perrier,et al.  Compensation strategies for the perturbation of the rounded vowel [u] using a lip-tube : A study of the control space in speech production , 1995 .

[19]  Marion Dohen,et al.  The Lombard Effect: a physiological reflex or a controlled intelligibility enhancement? , 2006 .

[20]  M. Picheny,et al.  Speaking clearly for the hard of hearing. II: Acoustic characteristics of clear and conversational speech. , 1986, Journal of speech and hearing research.

[21]  Louis Goldstein,et al.  Articulatory gestures as phonological units , 1989, Phonology.

[22]  B. Lindblom,et al.  Acoustical consequences of lip, tongue, jaw, and larynx movement. , 1970, The Journal of the Acoustical Society of America.

[23]  Richard Wright,et al.  The Hyperspace Effect: Phonetic Targets Are Hyperarticulated. , 1993 .

[24]  Michael I. Jordan,et al.  Trading relations between tongue-body raising and lip rounding in production of the vowel /u/: a pilot "motor equivalence" study. , 1993, The Journal of the Acoustical Society of America.

[25]  P. Denes On the Motor Theory of Speech Perception , 1965 .

[26]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[27]  V L Gracco,et al.  Some organizational characteristics of speech movement control. , 1994, Journal of speech and hearing research.

[28]  C. Fowler An event approach to the study of speech perception from a direct realist perspective , 1986 .

[29]  B. Lindblom,et al.  Role of articulation in speech perception: clues from production. , 1996, The Journal of the Acoustical Society of America.

[30]  P. Boersma Praat : doing phonetics by computer (version 4.4.24) , 2006 .

[31]  Mohamed Tahar Lallouache,et al.  Un poste "visage-parole" couleur : acquisition et traitement automatique des contours des lèvres , 1991 .

[32]  Marion Dohen,et al.  An acoustic and articulatory study of Lombard speech: global effects on the utterance , 2006, INTERSPEECH.

[33]  Maëva Garnier,et al.  Communiquer en environnement bruyant : de l’adaptation jusqu’au forçage vocal , 2007 .

[34]  Jeesun Kim,et al.  Lombard speech: Auditory (A), Visual (V) and AV effects , 2006 .

[35]  Francisco Casacuberta,et al.  An analysis of general acoustic-phonetic features for Spanish speech produced with the Lombard effect , 1996, Speech Commun..

[36]  S. Blumstein,et al.  Acoustic invariance in speech production: evidence from measurements of the spectral characteristics of stop consonants. , 1979, The Journal of the Acoustical Society of America.

[37]  John Smith,et al.  Interrelationship between vocal effort and vocal tract acoustics: a pilot study , 2008, INTERSPEECH.

[38]  C. Benoît,et al.  Effects of phonetic context on audio-visual intelligibility of French. , 1994, Journal of speech and hearing research.

[39]  R. Schulman,et al.  Articulatory dynamics of loud and normal speech. , 1989, The Journal of the Acoustical Society of America.

[40]  P. Kuhl,et al.  Categorization of Speech by Infants: Support for Speech-Sound Prototypes. , 1989 .

[41]  Maria Södersten,et al.  Cancellation of simulated environmental noise as a tool for measuring vocal performance during noise exposure. , 2002, Journal of voice : official journal of the Voice Foundation.

[42]  Jean-Luc Schwartz,et al.  A strong evidence for the existence of a large-scale integrated spectral representation in vowel perception , 1989, Speech Commun..

[43]  Joan E. Sussman A preliminary test of prototype theory for a [ba]‐to‐[da] continuum , 1993 .

[44]  J Robert-Ribes,et al.  Complementarity and synergy in bimodal speech: auditory, visual, and audio-visual identification of French oral vowels in noise. , 1998, The Journal of the Acoustical Society of America.