Can you 'read' tongue movements? Evaluation of the contribution of tongue display to speech understanding

Lip reading relies on visible articulators to ease speech understanding. However, lips and face alone provide very incomplete phonetic information: the tongue, that is generally not entirely seen, carries an important part of the articulatory information not accessible through lip reading. The question is thus whether the direct and full vision of the tongue allows tongue reading. We have therefore generated a set of audiovisual VCV stimuli with an audiovisual talking head that can display all speech articulators, including tongue, in an augmented speech mode. The talking head is a virtual clone of a human speaker and the articulatory movements have also been captured on this speaker using ElectroMagnetic Articulography (EMA). These stimuli have been played to subjects in audiovisual perception tests in various presentation conditions (audio signal alone, audiovisual signal with profile cutaway display with or without tongue, complete face), at various Signal-to-Noise Ratios. The results indicate: (1) the possibility of implicit learning of tongue reading, (2) better consonant identification with the cutaway presentation with the tongue than without the tongue, (3) no significant difference between the cutaway presentation with the tongue and the more ecological rendering of the complete face, (4) a predominance of lip reading over tongue reading, but (5) a certain natural human capability for tongue reading when the audio signal is strongly degraded or absent. We conclude that these tongue reading capabilities could be used for applications in the domains of speech therapy for speech retarded children, of perception and production rehabilitation of hearing impaired children, and of pronunciation training for second language learners.

[1]  F J IJsseldijk,et al.  Speechreading performance under different conditions of video image, repetition, and speech rate. , 1992, Journal of Speech and Hearing Research.

[2]  M. C. Jones Cued speech. , 1992, ASHA.

[3]  Christian Benoît,et al.  Which components of the face do humans and machines best speechread , 1996 .

[4]  Christian Abry,et al.  Does movement on the lips mean movement in the mind , 1996 .

[5]  Olov Engwall,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Feedback Strategies of Human and Virtual Tutors in Pronunciation Training Feedback Strategies of Human and Virtual Tutors in Pronunciation Training , 2022 .

[6]  Dominic W. Massaro,et al.  Read my tongue movements: bimodal learning to perceive and produce non-native speech /r/ and /l/ , 2003, INTERSPEECH.

[7]  Slim Ouni,et al.  Pronunciation training: the role of eye and ear , 2008, INTERSPEECH.

[8]  C. Stoel-Gammon,et al.  Prelinguistic vocalizations of hearing-impaired and normally hearing subjects: a comparison of consonantal inventories. , 1988, The Journal of speech and hearing disorders.

[9]  Shrikanth Narayanan,et al.  An approach to real-time magnetic resonance imaging for speech production. , 2003, The Journal of the Acoustical Society of America.

[10]  N. P. Erber Auditory-visual perception of speech. , 1975, The Journal of speech and hearing disorders.

[11]  Gérard Bailly,et al.  Can you "read tongue movements"? , 2008, INTERSPEECH.

[12]  C. Benoît,et al.  Effects of phonetic context on audio-visual intelligibility of French. , 1994, Journal of speech and hearing research.

[13]  Philip Hoole,et al.  Electromagnetic articulography in coarticulation research , 1997 .

[14]  Marlys A. Macken,et al.  From Babbling to Speech: A Re-Assessment of the Continuity Issue , 1985 .

[15]  Julia S. Falk,et al.  The emergent lexicon : the child's development of a linguistic vocabulary , 1990 .

[16]  Anne Baker,et al.  The development of phonology in the blind child. , 1987 .

[17]  M H Cohen,et al.  Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. , 1992, The Journal of the Acoustical Society of America.

[18]  Olov Engwall,et al.  Can audio-visual instructions help learners improve their articulation? - an ultrasound study of short term changes , 2008, INTERSPEECH.

[19]  Gérard Bailly,et al.  Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images , 2002, J. Phonetics.

[20]  Sascha Fagel,et al.  Visual information and redundancy conveyed by internal articulator dynamics in synthetic audiovisual speech , 2007, INTERSPEECH.

[21]  Christian Benoît,et al.  Audio-visual speech synthesis from French text: Eight years of models, designs and evaluation at the ICP , 1998, Speech Commun..

[22]  Olov Engwall,et al.  Can visualization of internal articulators support speech perception? , 2008, INTERSPEECH.

[23]  A. Serrurier,et al.  A three-dimensional articulatory model of the velum and nasopharyngeal wall based on MRI and CT data. , 2008, The Journal of the Acoustical Society of America.

[24]  Joanna Light,et al.  Using visible speech to train perception and production of speech for individuals with hearing loss. , 2004, Journal of speech, language, and hearing research : JSLHR.

[25]  Bernd J. Kröger,et al.  Two- and three-dimensional visual articulatory models for pronunciation training and for treatment of speech disorders , 2008, INTERSPEECH.

[26]  David G. Stork,et al.  Speechreading by Humans and Machines , 1996 .

[27]  Sascha Fagel,et al.  A 3-d virtual head as a tool for speech therapy for children , 2008, INTERSPEECH.

[28]  Jonas Beskow,et al.  Recent Developments In Facial Animation: An Inside View , 1998, AVSP.

[29]  Gérard Bailly,et al.  Tracking talking faces with shape and appearance models , 2004, Speech Commun..

[30]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[31]  D Montgomery Do dyslexics have difficulty accessing articulatory information? , 1981, Psychological research.

[32]  James Wc American association of mental deficiency presents panel on training the mentally retarded deaf. , 1967 .

[33]  Pierre Badin,et al.  Three-dimensional linear modeling of tongue: Articulatory data and models , 2006 .

[34]  Olle Bälter,et al.  Wizard-of-Oz test of ARTUR: a computer-based speech training system with articulation correction , 2005, Assets '05.