Multimodal categorization by hierarchical dirichlet process

In this paper, we propose a nonparametric Bayesian framework for categorizing multimodal sensory signals such as audio, visual, and haptic information by robots. The robot uses its physical embodiment to grasp and observe an object from various viewpoints as well as listen to the sound during the observation. The multimodal information enables the robot to form human-like object categories that are bases of intelligence. The proposed method is an extension of Hierarchical Dirichlet Process (HDP), which is a kind of nonparametric Bayesian models, to multimodal HDP (MHDP). MHDP can estimate the number of categories, while the parametric model, e.g. LDA-based categorization, requires to specify the number in advance. As this is an unsupervised learning method, a human user does not need to give any correct labels to the robot and it can classify objects autonomously. At the same time the proposed method provides a probabilistic framework for inferring object properties from limited observations. Validity of the proposed method is shown through some experimental results.

[1]  Tomoaki Nakamura,et al.  Bag of multimodal LDA models for concept formation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[2]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[3]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[5]  Thomas L. Griffiths,et al.  Modeling Transfer Learning in Human Categorization with the Hierarchical Dirichlet Process , 2010, ICML.

[6]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[9]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[10]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[11]  Tomoaki Nakamura,et al.  Grounding of word meanings in multimodal concepts using LDA , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  D. Aldous Exchangeability and related topics , 1985 .

[13]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, CVPR.

[16]  Tomoaki Nakamura,et al.  Multimodal object categorization by a robot , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  École d'été de probabilités de Saint-Flour,et al.  École d'été de probabilités de Saint-Flour XIII - 1983 , 1985 .

[18]  S. MacEachern,et al.  Bayesian Density Estimation and Inference Using Mixtures , 2007 .

[19]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[20]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .