Speech segmentation is facilitated by visual cues

Evidence from infant studies indicates that language learning can be facilitated by multimodal cues. We extended this observation to adult language learning by studying the effects of simultaneous visual cues (nonassociated object images) on speech segmentation performance. Our results indicate that segmentation of new words from a continuous speech stream is facilitated by simultaneous visual input that it is presented at or near syllables that exhibit the low transitional probability indicative of word boundaries. This indicates that temporal audio-visual contiguity helps in directing attention to word boundaries at the earliest stages of language learning. Off-boundary or arrhythmic picture sequences did not affect segmentation performance, suggesting that the language learning system can effectively disregard noninformative visual information. Detection of temporal contiguity between multimodal stimuli may be useful in both infants and second-language learners not only for facilitating speech segmentation, but also for detecting word–object relationships in natural environments.

[1]  R. Lickliter,et al.  Intersensory redundancy guides attentional selectivity and perceptual learning in infancy. , 2000, Developmental psychology.

[2]  S Rosenbloom,et al.  Space Perception in Early Infancy: Perception within a Common Auditory-Visual Space , 1971, Science.

[3]  D. Poeppel,et al.  Temporal window of integration in auditory-visual speech perception , 2007, Neuropsychologia.

[4]  F. Ramus,et al.  Correlates of linguistic rhythm in the speech signal , 1999, Cognition.

[5]  Patricia Zukow-Goldring,et al.  A social ecological realist approach to the emergence of the lexicon: Educating attention to amodal invariants in gesture and speech. , 1997 .

[6]  E. Newport,et al.  Computation of Conditional Probability Statistics by 8-Month-Old Infants , 1998 .

[7]  J. Mehler,et al.  Mora or syllable? Speech segmentation in Japanese , 1993 .

[8]  K. Pike,et al.  The intonation of American English , 1946 .

[9]  Q. Summerfield,et al.  Intermodal timing relations and audio-visual speech recognition by normal-hearing adults. , 1985, The Journal of the Acoustical Society of America.

[10]  Anne Baker,et al.  The development of phonology in the blind child. , 1987 .

[11]  Jeesun Kim,et al.  Repeating and Remembering Foreign Language Words: Implications for Language Teaching Systems , 2001, Artificial Intelligence Review.

[12]  H. McGurk,et al.  Space Perception in Early Infancy: Perception within a Common Auditory-Visual Space? , 1974, Science.

[13]  D. Reisberg,et al.  Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. , 1987 .

[14]  J. G. Snodgrass,et al.  A standardized set of 260 pictures: Norms for name agreement, image agreement, familiarity, and visual complexity. , 1980 .

[15]  Jeffery A. Jones,et al.  Visual Prosody and Speech Intelligibility , 2004, Psychological science.

[16]  Morten H. Christiansen,et al.  PSYCHOLOGICAL SCIENCE Research Article Statistical Learning Within and Between Modalities Pitting Abstract Against Stimulus-Specific Representations , 2022 .

[17]  J. G. Snodgrass,et al.  A standardized set of 260 pictures: norms for name agreement, image agreement, familiarity, and visual complexity. , 1980, Journal of experimental psychology. Human learning and memory.

[18]  C. Moore,et al.  Joint attention : its origins and role in development , 1995 .

[19]  L. Gogate,et al.  Intersensory redundancy facilitates learning of arbitrary relations between vowel sounds and objects in seven-month-old infants. , 1998, Journal of experimental child psychology.

[20]  Barbara Dodd,et al.  Lip reading in infants: Attention to speech presented in- and out-of-synchrony , 1979, Cognitive Psychology.

[21]  Dare A. Baldwin,et al.  Understanding the link between joint attention and language. , 1995 .

[22]  Barbara Dodd,et al.  The Role of Vision in the Perception of Speech , 1977, Perception.

[23]  Anne Cutler,et al.  The syllable's differing role in the segmentation of French and English. , 1986 .

[24]  F. Ramus,et al.  Language discrimination by human newborns and by cotton-top tamarin monkeys. , 2000, Science.

[25]  P. Gribble,et al.  Temporal constraints on the McGurk effect , 1996, Perception & psychophysics.

[26]  E. Gibson,et al.  An Ecological Approach to Perceptual Learning and Development , 2000 .

[27]  Lois Bloom,et al.  Language acquisition in its developmental context , 1998 .

[28]  P. Jusczyk,et al.  Infants' use of synchronized visual information to separate streams of speech. , 2005, Child development.

[29]  J R Saffran,et al.  Emerging integration of sequential and suprasegmental information in preverbal speech segmentation. , 1995, Child development.

[30]  S. Tipper,et al.  Quarterly Journal of Experimental Psychology , 1948, Nature.

[31]  P. Kuhl Early language acquisition: cracking the speech code , 2004, Nature Reviews Neuroscience.

[32]  P. Jusczyk,et al.  Clauses are perceptual units for young infants , 1987, Cognition.

[33]  Dare A. Baldwin,et al.  Infants' ability to consult the speaker for clues to word reference , 1993, Journal of Child Language.

[34]  D. Lewkowicz Perception of auditory-visual temporal synchrony in human infants. , 1996, Journal of experimental psychology. Human perception and performance.

[35]  E. Newport,et al.  WORD SEGMENTATION : THE ROLE OF DISTRIBUTIONAL CUES , 1996 .

[36]  A. Meltzoff,et al.  The bimodal perception of speech in infancy. , 1982, Science.

[37]  L. Gogate,et al.  A study of multimodal motherese: the role of temporal synchrony between verbal labels and gestures. , 2000, Child development.

[38]  J. Morton,et al.  Perceptual centers (P-centers). , 1976 .

[39]  L. Rosenblum,et al.  An audiovisual test of kinematic primitives for visual speech perception. , 1996, Journal of experimental psychology. Human perception and performance.

[40]  E. Spelke Perceiving Bimodally Specified Events in Infancy , 1979 .

[41]  Sophie K. Scott,et al.  The point of P-centres , 1998 .

[42]  A. Macleod,et al.  Quantifying the contribution of vision to speech perception in noise. , 1987, British journal of audiology.

[43]  BMC Neuroscience , 2003 .

[44]  F. Ramus,et al.  Language identification with suprasegmental cues: a study based on speech resynthesis. , 1999, The Journal of the Acoustical Society of America.

[45]  M. Tomasello Constructing a Language , 2005 .

[46]  N. F. Dixon,et al.  The Detection of Auditory Visual Desynchrony , 1980, Perception.

[47]  Julia L. Evans,et al.  Can Infants Map Meaning to Newly Segmented Words? , 2007, Psychological science.

[48]  Chen Yu,et al.  A multimodal learning interface for grounding spoken language in sensory perceptions , 2003, ICMI '03.

[49]  Kathy Hirsh-Pasek,et al.  Baby Wordsmith , 2006 .

[50]  William C. Ogden,et al.  Visible speech improves human language understanding: Implications for speech processing systems , 1995, Artificial Intelligence Review.

[51]  A. Rodríguez-Fornells,et al.  Beneficial effects of word final stress in segmenting a new language: evidence from ERPs , 2008, BMC Neuroscience.

[52]  A. Meltzoff,et al.  The Intermodal Representation of Speech in Infants , 1984 .

[53]  A. Fernald,et al.  Expanded Intonation Contours in Mothers' Speech to Newborns. , 1984 .

[54]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[55]  Dare A. Baldwin,et al.  Infants' contribution to the achievement of joint reference. , 1991, Child development.

[56]  David Taylor Hearing by Eye: The Psychology of Lip-Reading , 1988 .

[57]  D A Sanders,et al.  The relative contribution of visual and auditory components of speech to speech intelligibility as a function of three conditions of frequency distortion. , 1971, Journal of speech and hearing research.

[58]  D. Abercrombie,et al.  Elements of General Phonetics , 1967 .

[59]  P. Jusczyk,et al.  A precursor of language acquisition in young infants , 1988, Cognition.

[60]  Thierry Dutoit,et al.  The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[61]  A. Meltzoff,et al.  The importance of eyes: how infants interpret adult looking behavior. , 2002, Developmental psychology.

[62]  P. Bloom Mindreading, Communication and the Learning of Names for Things , 2002 .

[63]  R. Lickliter,et al.  Intersensory redundancy guides early perceptual and cognitive development. , 2002, Advances in child development and behavior.

[64]  Michael Tomasello,et al.  Learning words in nonostensive contexts , 1994 .

[65]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[66]  Elissa L. Newport,et al.  The Role of Stress and Position in Determining First Words , 1992 .

[67]  F. Ramus,et al.  The role of speech rhythm in language discrimination: further tests with a non-human primate. , 2005, Developmental science.

[68]  Rebecca J. Brand,et al.  Breaking the language barrier: an emergentist coalition model for the origins of word learning. , 2000, Monographs of the Society for Research in Child Development.

[69]  Catharine H. Echols A perceptually-based model of children's earliest productions , 1993, Cognition.

[70]  P. Jusczyk,et al.  Infants' sensitivity to phonotactic patterns in the native language. , 1994 .

[71]  S. Soto-Faraco,et al.  Deconstructing the McGurk-MacDonald illusion. , 2009, Journal of experimental psychology. Human perception and performance.