Multimodal perceptual organization of speech: Evidence from tone analogs of spoken utterances

Theoretical and practical motives alike have prompted recent investigations of multimodal speech perception. Theoretically, multimodal studies have extended the conceptualization of perceptual organization beyond the familiar modality-bound accounts deriving from Gestalt psychology. Practically, such investigations have been driven by a need to understand the proficiency of multimodal speech perception using an electrocochlear prosthesis for hearing. In each domain, studies have shown that perceptual organization of speech can occur even when the perceiver's auditory experience departs from natural speech qualities. Accordingly, our research examined auditor-visual multimodal integration of videotaped faces and selected acoustic constituents of speech signals, each realized as a single sinewave tone accompanying a video image of an articulating face. The single tone reproduced the frequency and amplitude of the phonatory cycle or of one of the lower three oral formants. Our results showed a distinct advantage for the condition pairing the video image of the face with a sinewave replicating the second formant, despite its unnatural timbre and its presentation in acoustic isolation from the rest of the speech signal. Perceptual coherence of multimodal speech in these circumstances is established when the two modalities concurrently specify the same underlying phonetic attributes.

[1]  M. Wertheimer Untersuchungen zur Lehre von der Gestalt. II , 1923 .

[2]  A. Bosman,et al.  Speechreading supplemented with auditorily presented speech elements in the profoundly hearing impaired. , 1997, Audiology : official organ of the International Society of Audiology.

[3]  D. Massaro,et al.  Perceiving Talking Faces , 1995 .

[4]  D. H. Warren,et al.  Immediate perceptual response to intersensory discrepancy. , 1980, Psychological bulletin.

[5]  R. Plomp,et al.  Speechreading supplemented with formant-frequency information from voiced speech. , 1985, The Journal of the Acoustical Society of America.

[6]  J. L. Miller,et al.  On the role of visual rate information in phonetic perception , 1985, Perception & psychophysics.

[7]  P. Gribble,et al.  Temporal constraints on the McGurk effect , 1996, Perception & psychophysics.

[8]  Béatrice de Gelder,et al.  Auditory-visual interaction in voice localization and in bimodal speech recognition: the effects of desynchronization , 1997, AVSP.

[9]  David B. Pisoni,et al.  Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics , 1996, Speech Commun..

[10]  D. Pisoni,et al.  Speech perception without traditional speech cues. , 1981, Science.

[11]  P E Rubin,et al.  On the perception of intonation from sinusoidal sentences , 1984, Perception & psychophysics.

[12]  Jennifer S. Pardo,et al.  On the perceptual organization of speech. , 1994, Psychological review.

[13]  Robert E. Remez,et al.  Audio-Visual Speech Perception Without Speech Cues: A First Report , 1996 .

[14]  Peter Ladefoged,et al.  On the Fusion of Sounds Reaching Different Sense Organs , 1957 .

[15]  Brian C. J. Moore,et al.  Voice pitch as an aid to lipreading , 1981, Nature.