Rigid vs non-rigid face and head motion in phone and tone perception

There is recent evidence that the visual concomitants, not only of the articulation of phones (consonants & vowels), but also of tones (fundamental frequency variations that signal lexical meaning in tone languages) facilitate speech perception. Analysis of speech production data from a Cantonese speaker suggests that the source of this perceptual information for tones involve rigid motion of the head rather than non-rigid face motion. A perceptual discrimination study was conducted using OPTOTRAK output in which rigid or non-rigid motion of the head could be presented independently, using two conditions: one in which words to be discriminated only differed in tone, and another in which they only differed in phone. The results suggest that non-rigid motion is the critical determinant for successful discrimination of phones, whereas both non-rigid and rigid motion are required for the discrimination of tones.

[1]  J. MacDonald,et al.  Hearing Lips and Seeing Voices: Illusion and Serendipity in Auditory‐Visual Perception Research , 2008 .

[2]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[3]  V. Fromkin,et al.  Tone : a linguistic survey , 1980 .

[4]  Kaoru Sekiyama,et al.  Differences in auditory-visual speech perception between Japanese and Americans: McGurk effect as a function of incompatibility. , 1994 .

[5]  K. Sekiyama,et al.  Cultural and linguistic factors in audiovisual speech processing: The McGurk effect in Chinese subjects , 1997, Perception & psychophysics.

[6]  Denis Burnham,et al.  Visual Discrimination of Cantonese Tone by Ton Speakers , and by Non-Tonal Languag , 2001 .

[7]  Béatrice de Gelder,et al.  Inter-language differences in the mcgurk effect for dutch and Cantonese listeners , 1995, EUROSPEECH.

[8]  Takaaki Kuratate,et al.  Linking facial animation, head motion and speech acoustics , 2002, J. Phonetics.

[9]  Roxane Bertrand,et al.  About the relationship between eyebrow movements and Fo variations , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  E. Vatikiotis-Bateson,et al.  Task constraints on robot realism: the case of talking heads , 2000, Proceedings 9th IEEE International Workshop on Robot and Human Interactive Communication. IEEE RO-MAN 2000 (Cat. No.00TH8499).

[11]  Denis Burnham,et al.  Auditory-visual perception of lexical tone , 2001, INTERSPEECH.

[12]  B. Silverman,et al.  Functional Data Analysis , 1997 .