Multimodal Attention Creates the Visual Input for Infant Word Learning

Infant language acquisition is fundamentally an embodied process, relying on the body to select information from the learning environment. Infants show their attention to an object not merely by gazing at the object, but also through orienting their body towards the object and generating various types of manual actions on the object, such as holding, touching, and shaking. The goal of the present study was to examine how multimodal attention shapes infant word learning in real-time. Infants and their parents played in a home-like lab with unfamiliar objects with assigned labels. While playing, participants wore wireless head-mounted eye trackers to capture visual attention. Infants were then tested on their knowledge of the new words. We identified all the utterances in which parents labeled the learned or not learned objects and analyzed infant multimodal attention during and around labeling. We found that proportion of time spent in handeye coordination predicted learning outcomes. To understand the learning advantage hand-eye coordination creates, we compared the size of objects in the infant's field of view. Although there were no differences in object size between learned and not learned labeling utterances, hand-eye coordination created the most informative views. Together, these results suggest that in-the-moment word learning may be driven by the greater access to informative object views that hand-eye coordination affords.