Mutual learning of an object concept and language model based on MLDA and NPYLM

Humans develop their concept of an object by classifying it into a category, and acquire language by interacting with others at the same time. Thus, the meaning of a word can be learnt by connecting the recognized word and concept. We consider such an ability to be important in allowing robots to flexibly develop their knowledge of language and concepts. Accordingly, we propose a method that enables robots to acquire such knowledge. The object concept is formed by classifying multimodal information acquired from objects, and the language model is acquired from human speech describing object features. We propose a stochastic model of language and concepts, and knowledge is learnt by estimating the model parameters. The important point is that language and concepts are interdependent. There is a high probability that the same words will be uttered to objects in the same category. Similarly, objects to which the same words are uttered are highly likely to have the same features. Using this relation, the accuracy of both speech recognition and object classification can be improved by the proposed method. However, it is difficult to directly estimate the parameters of the proposed model, because there are many parameters that are required. Therefore, we approximate the proposed model, and estimate its parameters using a nested Pitman-Yor language model and multimodal latent Dirichlet allocation to acquire the language and concept, respectively.

[1]  Tomoaki Nakamura,et al.  Autonomous acquisition of multimodal information for online object concept formation by robots , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Naonori Ueda,et al.  Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling , 2009, ACL.

[3]  Wolfram Burgard,et al.  Object identification with tactile sensors using bag-of-features , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[5]  Jivko Sinapov,et al.  Object category recognition by a humanoid robot using behavior-grounded relational learning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[6]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .

[7]  Wolfram Burgard,et al.  Unsupervised discovery of object classes from range data using latent Dirichlet allocation , 2009, Robotics: Science and Systems.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  L. Natale,et al.  Learning haptic representation of objects , 2004 .

[11]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[12]  Robin Andrew Russell,et al.  Object recognition by a 'smart' tactile sensor , 2000 .

[13]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[14]  Tomoaki Nakamura,et al.  Online learning of concepts and words using multimodal LDA and hierarchical Pitman-Yor Language Model , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.