Embodied Multi-modal Interaction in Language learning: the EMIL data collection

Humans develop cognitive functions from a body-rational perspective. Particularly, infants develop representations through sensorimotor environmental interactions and goaldirected actions [1]. This embodiment plays a major role in modeling cognitive functions from active perception to natural language learning. For the developmental robotics community, working with humanoid robotic proxies, datasets are interesting that provide low-level multi-modal perception during the environmental interactions [2]. Related Data Sets: In the last years, many labs made considerable efforts to provide such datasets, focussing on different research goals but also taking technical limitations into account. Examples include: the KIT Motion-Language set for descriptions of whole-body poses [3], the MOD165 set of a gripper-robot having vision, audio, and tactile senses for interacting with objects [4], the Core50 set focussing on human perspective and vision [5], and the similar but upscaled EMMI and iCubWorld sets [6]. However, none of these corpora provide true continuous multi-modal perception for interaction cases, as we would expect an infant is experiencing. In this preview, we introduce the Embodied Multi-modal Interaction in Language learning (EMIL) data collection, an ongoing series of datasets for studying human cognitive functions on developmental robots. Since we aim to utilize resources in tight collaboration with the research community, we propose the first set on object manipulation for fostering discussions on future directions and needs within the community1.

[1]  J. Tani Exploring Robotic Minds: Actions, Symbols, and Consciousness as Self-Organizing Dynamic Phenomena , 2016 .

[2]  Davide Maltoni,et al.  CORe50: a New Dataset and Benchmark for Continuous Object Recognition , 2017, CoRL.

[3]  Pierre-Yves Oudeyer,et al.  Computational Theories of Curiosity-Driven Learning , 2018, ArXiv.

[4]  Stefan Wermter,et al.  NICO — Neuro-inspired companion: A developmental humanoid robot platform for multimodal interaction , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[5]  Stefano Nolfi,et al.  Embodied Language Learning and Cognitive Bootstrapping: Methods and Design Principles , 2016 .

[6]  Fernanda Monteiro Eliott,et al.  An Object is Worth Six Thousand Pictures: The Egocentric, Manual, Multi-image (EMMI) Dataset , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[7]  Stefan Wermter,et al.  Interactive natural language acquisition in a multi-modal recurrent neural architecture , 2017, Connect. Sci..

[8]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[9]  Tamim Asfour,et al.  The KIT Motion-Language Dataset , 2016, Big Data.

[10]  Zhiyuan Liu,et al.  Crossmodal Language Grounding, Learning, and Teaching , 2016, CoCo@NIPS.

[11]  Tomoaki Nakamura,et al.  Ensemble-of-Concept Models for Unsupervised Formation of Multiple Categories , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[12]  Stefan Wermter,et al.  Modeling development of natural multi-sensory integration using neural self-organisation and probabilistic population codes , 2015, Connect. Sci..

[13]  A. Cangelosi,et al.  Developmental Robotics: From Babies to Robots , 2015 .