Models of Cross-Situational and Crossmodal Word Learning in Task-Oriented Scenarios

We present two related but different cross-situational and crossmodal models of incremental word learning. Model 1 is a Bayesian approach for co-learning object-word mappings and referential intention which allows for incremental learning from only a few situations where the display of referents to the learning system is systematically varied. We demonstrate the robustness of the model with respect to sensory noise, including errors in the visual (object recognition) and auditory (recognition of words) systems. The model is then integrated with a cognitive robotic architecture in order to realize cross-situational word learning on a robot. A different approach to word learning is demonstrated with Model 2, an information-theoretic model for the object- and action-word learning from modality rich input data based on pointwise mutual information. The approach is inspired by insights from language development and learning where the caregiver/teacher typically shows objects and performs actions to the infant while naming what the teacher is doing. We demonstrate the word learning capabilities of the model, feeding it with crossmodal input data from two German multimodal corpora which comprise visual scenes of performed actions and related utterances.

[1]  D. Gentner Why verbs are hard to learn , 2006 .

[2]  Katharina J. Rohlfing,et al.  Verbs in Mothers’ Input to Six-Month-Olds: Synchrony between Presentation, Meaning, and Actions Is Related to Later Verb Acquisition , 2017, Brain sciences.

[3]  Julia Grant,et al.  The nonverbal context of mothers' speech to infants , 1983 .

[4]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[5]  Michael C. Frank,et al.  PSYCHOLOGICAL SCIENCE Research Article Using Speakers ’ Referential Intentions to Model Early Cross-Situational Word Learning , 2022 .

[6]  P. Brown,et al.  Children’s first verbs in Tzeltal: evidence for an early verb category , 1998 .

[7]  Matthias Scheutz,et al.  Acquisition of Word-Object Associations from Human-Robot and Human-Human Dialogues , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[8]  Madhavilatha Maganti,et al.  Cross-cultural evidence for multimodal motherese: Asian Indian mothers' adaptive use of synchronous words and gestures. , 2015, Journal of experimental child psychology.

[9]  Matthias Scheutz,et al.  Early Syntactic Bootstrapping in an Incremental Memory-Limited Word Learner , 2018, AAAI.

[10]  Michael C. Frank,et al.  Social and Discourse Contributions to the Determination of Reference in Cross-Situational Word Learning , 2013 .

[11]  Chen Yu,et al.  The Multisensory Nature of Verbal Discourse in Parent–Toddler Interactions , 2016, Developmental neuropsychology.

[12]  Dedre Gentner,et al.  Why Nouns Are Learned before Verbs: Linguistic Relativity Versus Natural Partitioning. Technical Report No. 257. , 1982 .

[13]  Michael Tomasello,et al.  Two-year-olds learn novel nouns, verbs, and conventional actions from massed or distributed exposures. , 2002, Developmental psychology.

[14]  Manfred K. Warmuth,et al.  THE CMU SPHINX-4 SPEECH RECOGNITION SYSTEM , 2001 .

[15]  A. Woodward Infants selectively encode the goal object of an actor's reach , 1998, Cognition.

[16]  Matthias Scheutz,et al.  Joint acquisition of word order and word referent in a memory-limited and incremental learner , 2017, 2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom).

[17]  Madhavilatha Maganti,et al.  The Origins of Verb Learning: Preverbal and Postverbal Infants' Learning of Word-Action Relations. , 2017, Journal of speech, language, and hearing research : JSLHR.

[18]  Mutsumi Imai,et al.  A Cross-Linguistic Comparison of Novel Noun and Verb Learning in English-, Japanese-, and Chinese- Speaking Children , 2010 .

[19]  Peter Ford Dominey,et al.  Indeterminacy in language acquisition: the role of child directed speech and joint attention , 2004, Journal of Neurolinguistics.

[20]  R. Shiffrin,et al.  An associative model of adaptive inference for learning word–referent mappings , 2012, Psychonomic bulletin & review.

[21]  Adrian Paschke,et al.  The Gold Standard in Corpus Annotation , 2014, IEEE GSC.

[22]  Matthias Scheutz,et al.  Sensitivity to Input Order: Evaluation of an Incremental and Memory-Limited Bayesian Cross-Situational Word Learning Model , 2018, COLING.

[23]  Afsaneh Fazly,et al.  Integrating Syntactic Knowledge into a Model of Cross-situational Word Learning , 2010 .

[24]  Soonja Choi,et al.  Verbs in early lexical and syntactic development in Korean , 1998 .

[25]  Markus Vincze,et al.  Grounded Word Learning on a Pepper Robot , 2018, IVA.

[26]  Evan A. Krause,et al.  Novel Mechanisms for Natural Human-Robot Interactions in the DIARC Architecture , 2013 .

[27]  L. Gogate,et al.  Development of Early Multisensory Perception and Communication: From Environmental and Behavioral to Neural Signatures , 2016, Developmental neuropsychology.

[28]  Daniel J. Weiss,et al.  Stacking the evidence: Parents’ use of acoustic packaging with preschoolers , 2019, Cognition.

[29]  Katja Liebal,et al.  The emergent practice of infant compliance: An exploration in two cultures. , 2013, Developmental psychology.

[30]  Brigitte Krenn,et al.  Action Verb Corpus , 2018, LREC.

[31]  Rachel Pulverman,et al.  English- and Mandarin-learning infants' discrimination of actions and objects in dynamic events. , 2015, Developmental psychology.

[32]  Matthias Scheutz,et al.  An embodied incremental Bayesian model of cross-situational word learning , 2017, 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[33]  J. Hamlin,et al.  Social evaluation by preverbal infants , 2007, Nature.

[34]  Chen Yu,et al.  A unified model of early word learning: Integrating statistical and social cues , 2007, Neurocomputing.

[35]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[36]  L. Gogate,et al.  A study of multimodal motherese: the role of temporal synchrony between verbal labels and gestures. , 2000, Child development.

[37]  George Hollich,et al.  Early Verb-Action and Noun-Object Mapping Across Sensory Modalities: A Neuro-Developmental View , 2016, Developmental neuropsychology.

[38]  Deb Roy,et al.  Grounded spoken language acquisition: experiments in word learning , 2003, IEEE Trans. Multim..

[39]  Brigitte Krenn,et al.  The OFAI Multi-Modal Task Description Corpus , 2016, LREC.