Grounding of Word Meanings in Latent Dirichlet Allocation-Based Multimodal Concepts

In this paper we propose a latent Dirichlet allocation (LDA)-based framework for multimodal categorization and words grounding by robots. The robot uses its physical embodiment to grasp and observe an object from various view points, as well as to listen to the sound during the observing period. This multimodal information is used for categorizing and forming multimodal concepts using multimodal LDA. At the same time, the words acquired during the observing period are connected to the related concepts, which are represented by the multimodal LDA. We also provide a relevance measure that encodes the degree of connection between words and modalities. The proposed algorithm is implemented on a robot platform and some experiments are carried out to evaluate the algorithm. We also demonstrate simple conversation between a user and the robot based on the learned model.

[1]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  David L. Faigman,et al.  Human category learning. , 2005, Annual review of psychology.

[3]  Tomoaki Nakamura,et al.  Learning novel objects using out-of-vocabulary word segmentation and object extraction for home assistant robots , 2010, 2010 IEEE International Conference on Robotics and Automation.

[4]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[5]  Tomoaki Nakamura,et al.  Multimodal object categorization by a robot , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Naoto Iwahashi,et al.  Robots That Learn Language: Developmental Approach to Human-Machine Conversations , 2006, EELC.

[7]  Chen Yu,et al.  On the Integration of Grounding Language and Learning Objects , 2004, AAAI.

[8]  Fei-FeiLi,et al.  One-Shot Learning of Object Categories , 2006 .

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[11]  David J. Freedman,et al.  Experience-dependent representation of visual categories in parietal cortex , 2006, Nature.

[12]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .

[13]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[15]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[16]  Tomoaki Nakamura,et al.  Grounding of word meanings in multimodal concepts using LDA , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[18]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[20]  Alex Pentland,et al.  Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[21]  P. Bloom Descartes' Baby: How the Science of Child Development Explains What Makes Us Human , 2004 .