Alignment to visual speech information

Speech alignment is the tendency for interlocutors to unconsciously imitate one another’s speaking style. Alignment also occurs when a talker is asked to shadow recorded words (e.g., Shockley, Sabadini, & Fowler, 2004). In two experiments, we examined whether alignment could be induced with visual (lipread) speech and with auditory speech. In Experiment 1, we asked subjects to lipread and shadow out loud a model silently uttering words. The results indicate that shadowed utterances sounded more similar to the model’s utterances than did subjects’ nonshadowed read utterances. This suggests that speech alignment can be based on visual speech. In Experiment 2, we tested whether raters could perceive alignment across modalities. Raters were asked to judge the relative similarity between a model’s visual (silent video) utterance and subjects’ audio utterances. The subjects’ shadowed utterances were again judged as more similar to the model’s than were read utterances, suggesting that raters are sensitive to cross-modal similarity between aligned words.

[1]  J. Navarra,et al.  Hearing lips in a second language: visual articulatory information enables the perception of second language sounds , 2007, Psychological research.

[2]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[3]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[4]  L. Rosenblum,et al.  Visual speech information for face recognition , 2002, Perception & psychophysics.

[5]  H Bekkering,et al.  Motor activation from visible speech: evidence from stimulus response compatibility. , 2000, Journal of experimental psychology. Human perception and performance.

[6]  L. Rosenblum,et al.  Look who's Talking: Recognizing Friends from Visible Articulation , 2007, Perception.

[7]  L. Rosenblum,et al.  Lip-Read Me Now, Hear Me Better Later , 2006, Psychological science.

[8]  C. Fowler,et al.  Rapid access to speech gestures in perception: Evidence from choice and simple response time tasks. , 2003, Journal of memory and language.

[9]  D. Pisoni,et al.  Crossmodal Source Identification in Speech Perception , 2004, Ecological psychology : a publication of the International Society for Ecological Psychology.

[10]  P. McGuire,et al.  Silent speechreading in the absence of scanner noise: an event‐related fMRI study , 2000, Neuroreport.

[11]  Maurizio Gentilucci,et al.  Imitation during phoneme production , 2007, Neuropsychologia.

[12]  Ruth Campbell,et al.  Speechreading circuits in people born deaf , 2002, Neuropsychologia.

[13]  P. Arnold,et al.  Bisensory augmentation: a speechreading advantage when speech is clearly audible and intact. , 2001, British journal of psychology.

[14]  L. Nygaard,et al.  Gender Differences in Vocal Accommodation: , 2002 .

[15]  C. Fowler,et al.  Gestural drift in a bilingual speaker of Brazilian Portuguese and English , 1997 .

[16]  K. Shockley,et al.  Imitation in shadowing words , 2004, Perception & psychophysics.

[17]  A. Meltzoff,et al.  Explaining Facial Imitation: A Theoretical Model. , 1997, Early development & parenting.

[18]  E. Vatikiotis-Bateson,et al.  `Putting the Face to the Voice' Matching Identity across Modality , 2003, Current Biology.

[19]  Stefan R Schweinberger,et al.  Speaker Variations Influence Speechreading Speed for Dynamic Faces , 2005, Perception.

[20]  Lawrence D. Rosenblum,et al.  Primacy of Multimodal Speech Perception , 2008 .

[21]  W. Thalheimer,et al.  How to calculate effect sizes from published research: A simplified methodology , 2002 .

[22]  Jennifer S. Pardo,et al.  On phonetic convergence during conversational interaction. , 2006, The Journal of the Acoustical Society of America.

[23]  E. Bullmore,et al.  Activation of auditory cortex during silent lipreading. , 1997, Science.

[24]  M. Natale CONVERGENCE OF MEAN VOCAL INTENSITY IN DYADIC COMMUNICATION AS A FUNCTION OF SOCIAL DESIRABILITY , 1975 .

[25]  S. W. Gregory,et al.  Analysis of fundamental frequency reveals covariation in interview partners' speech , 1990 .

[26]  L. Rosenblum,et al.  Hearing a face: Cross-modal speaker matching using isolated visible speech , 2006, Perception & psychophysics.

[27]  T. Chartrand,et al.  The chameleon effect: the perception-behavior link and social interaction. , 1999, Journal of personality and social psychology.

[28]  S. M. Sheffert,et al.  Audiovisual speech facilitates voice learning , 2004, Perception & psychophysics.

[29]  Carol A. Fowler,et al.  THE EFFECTS OF VOICE AND VISIBLE SPEAKER CHANGE ON MEMORY FOR SPOKEN WORDS , 1995 .

[30]  Robert Chen,et al.  Observation–execution matching system for speech: a magnetic stimulation study , 2001, Neuroreport.

[31]  Anne Baker,et al.  The development of phonology in the blind child. , 1987 .

[32]  Jeesun Kim,et al.  Repeating and Remembering Foreign Language Words: Implications for Language Teaching Systems , 2001, Artificial Intelligence Review.

[33]  Karen Lander,et al.  Does face familiarity influence speechreadability? , 2008, Quarterly journal of experimental psychology.

[34]  David B Pisoni,et al.  Specification of cross-modal source information in isolated kinematic displays of speech. , 2004, The Journal of the Acoustical Society of America.

[35]  D. Pisoni,et al.  Cross-modal source information and spoken word recognition. , 2004, Journal of experimental psychology. Human perception and performance.

[36]  Sadaoki Furui,et al.  Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance , 2008, Comput. Speech Lang..

[37]  D. Reisberg,et al.  Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. , 1987 .

[38]  S. Goldinger Echoes of echoes? An episodic theory of lexical access. , 1998, Psychological review.

[39]  L. A. Chistovich,et al.  Speech: articulation and perception , 1965 .

[40]  H. Giles,et al.  Accommodation theory: Communication, context, and consequence. , 1991 .

[41]  S. Goldinger,et al.  Episodic memory reflected in printed word naming , 2004, Psychonomic bulletin & review.

[42]  Jennifer S. Pardo,et al.  The Perception of Speech , 2006 .

[43]  L D Rosenblum,et al.  Effects of talker variability on speechreading , 2000, Perception & psychophysics.

[44]  G. Rizzolatti,et al.  Speech listening specifically modulates the excitability of tongue muscles: a TMS study , 2002, The European journal of neuroscience.

[45]  F. X. Castellanos,et al.  Speech-production measures of speech perception: rapid shadowing of VCV syllables. , 1980, The Journal of the Acoustical Society of America.

[46]  P F Seitz,et al.  The use of visible speech cues for improving auditory detection of spoken sentences. , 2000, The Journal of the Acoustical Society of America.

[47]  K. Shockley,et al.  Mutual interpersonal postural constraints are involved in cooperative conversation. , 2003, Journal of experimental psychology. Human perception and performance.

[48]  R J Porter,et al.  Rapid reproduction of vowel-vowel sequences: evidence for a fast and direct acoustic-motoric linkage in speech. , 1980, Journal of speech and hearing research.

[49]  S. Schweinberger,et al.  Asymmetric relationships among perceptions of facial identity, emotion, and facial speech. , 1998, Journal of experimental psychology. Human perception and performance.