Integration of various concepts and grounding of word meanings using multi-layered multimodal LDA for sentence generation

In the field of intelligent robotics, object handling by robots can be achieved by capturing not only the object concept through object categorization, but also other concepts (e.g., the movement while using the object), as well as the relationship between concepts. Moreover, capturing the concepts of places and people is also necessary to enable the robot to gain real-world understanding. In this study, we propose multi-layered multimodal latent Dirichlet allocation (mMLDA) to realize the formation of various concepts, and the integration of those concepts, by robots. Because concept formation and integration can be conducted by mMLDA, the formation of each concept affects others, resulting in a more appropriate formation. Another issue to be addressed in this paper is the language acquisition by the robots. We propose a method to infer which words are originally connected to a concept using mutual information between words and concepts. Moreover, the order of concepts in teaching utterances can be learned using a simple Markov model, which corresponds to grammar. This grammar can be used to generate sentences that represent the observed information. We report the results of experiments to evaluate the effectiveness of the proposed method.

[1]  Tomoaki Nakamura,et al.  Autonomous acquisition of multimodal information for online object concept formation by robots , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Tomoaki Nakamura,et al.  Multimodal categorization by hierarchical dirichlet process , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Yoshihiko Nakamura,et al.  Bigram-based natural language model and statistical motion symbol model for scalable language of humanoid robots , 2012, 2012 IEEE International Conference on Robotics and Automation.

[4]  Tomoaki Nakamura,et al.  Integrated concept of objects and human motions based on multi-layered multimodal LDA , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Tetsuya Ogata,et al.  Inter-modality mapping in robot with recurrent neural network , 2010, Pattern Recognit. Lett..

[6]  Yiannis Aloimonos,et al.  A Corpus-Guided Framework for Robotic Visual Perception , 2011, Language-Action Tools for Cognitive Artificial Agents.

[7]  Tomoaki Nakamura,et al.  Grounding of word meanings in multimodal concepts using LDA , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[9]  Fei-Fei Li,et al.  Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Tomoaki Nakamura,et al.  Multimodal object categorization by a robot , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.