Can you "read tongue movements"?

Abstract Lip reading relies on visible articulators to ease audiovisual speech understanding. However, lips and face alone provide very incomplete phonetic information: the tongue, t hat is generally not entirely seen, carries an important p art of the articulatory information not accessible through lip reading . The question was thus whether the direct and full v ision of the tongue allows tongue reading . We have therefore generated a set of audiovisual VCV stimuli by contro lling an audiovisual talking head that can display all speec h articulators, including tongue, in an augmented speech mode, from articulators movements tracked on a speaker. T hese stimuli have been played to subjects in a series of audiovisual perception tests in various presentation conditions (audio signal alone, audiovisual signal with profile cutaw ay display with or without tongue, complete face), at various Signal-to-Noise Ratios. The results show a given implicit effe ct of tongue reading learning, a preference for the more ecological rendering of the complete face in comparison with t he cutaway presentation, a predominance of lip reading over tongue reading, but the capability of tongue readin g to take over when the audio signal is strongly degraded or absent. We conclude that these tongue reading capabilities cou ld be used for applications in the domain of speech therapy fo r speech retarded children, perception and production rehabi litation of hearing impaired children, and pronunciation traini ng for second language learners. Index Terms : Lip reading, tongue reading, audiovisual speech perception, audiovisual talking head, hearin g losses, augmented speech.

[1]  Christian Benoît,et al.  Audio-visual speech synthesis from French text: Eight years of models, designs and evaluation at the ICP , 1998, Speech Commun..

[2]  Gérard Bailly,et al.  Tracking talking faces with shape and appearance models , 2004, Speech Commun..

[3]  N. P. Erber Auditory-visual perception of speech. , 1975, The Journal of speech and hearing disorders.

[4]  Christian Benoît,et al.  Which components of the face do humans and machines best speechread , 1996 .

[5]  Gérard Bailly,et al.  Can you 'read' tongue movements? Evaluation of the contribution of tongue display to speech understanding , 2007, Speech Commun..

[6]  Joanna Light,et al.  Using visible speech to train perception and production of speech for individuals with hearing loss. , 2004, Journal of speech, language, and hearing research : JSLHR.

[7]  Olov Engwall,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Feedback Strategies of Human and Virtual Tutors in Pronunciation Training Feedback Strategies of Human and Virtual Tutors in Pronunciation Training , 2022 .

[8]  C. Benoît,et al.  Effects of phonetic context on audio-visual intelligibility of French. , 1994, Journal of speech and hearing research.

[9]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[10]  Sascha Fagel,et al.  Visual information and redundancy conveyed by internal articulator dynamics in synthetic audiovisual speech , 2007, INTERSPEECH.

[11]  Pierre Badin,et al.  Three-dimensional linear modeling of tongue: Articulatory data and models , 2006 .

[12]  Olle Bälter,et al.  Wizard-of-Oz test of ARTUR: a computer-based speech training system with articulation correction , 2005, Assets '05.