Autonomous acquisition of multimodal information for online object concept formation by robots

This paper proposes a robot that acquires multimodal information, i.e. auditory, visual, and haptic information, fully autonomous way using its embodiment. We also propose an online algorithm of multimodal categorization based on the acquired multimodal information and words, which are partially given by human users. The proposed framework makes it possible for the robot to learn object concepts naturally in everyday operation in conjunction with a small amount of linguistic information from human users. In order to obtain multimodal information, the robot detects an object on a fla surface. Then the robot grasps and shakes it for gaining haptic and auditory information. For obtaining visual information, the robot uses a hand held small observation table, so that the robot can control the viewpoints for observing the object. As for the multimodal concept formation, the multimodal LDA using Gibbs sampling is extended to the online version in this paper. The proposed algorithms are implemented on a real robot and tested using real everyday objects in order to show validity of the proposed system.

[1]  Gary R. Bradski,et al.  Fast 3D recognition and pose using the Viewpoint Feature Histogram , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Tomoaki Nakamura,et al.  Multimodal object categorization by a robot , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Tamim Asfour,et al.  Autonomous acquisition of visual multi-view object representations for object recognition on a humanoid robot , 2010, 2010 IEEE International Conference on Robotics and Automation.

[4]  Tomoaki Nakamura,et al.  Bag of multimodal LDA models for concept formation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[5]  Tomoaki Nakamura,et al.  Learning novel objects using out-of-vocabulary word segmentation and object extraction for home assistant robots , 2010, 2010 IEEE International Conference on Robotics and Automation.

[6]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[7]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[9]  Michael Beetz,et al.  Robotic grasping of unmodeled objects using time-of-flight range data and finger torque information , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Thomas L. Griffiths,et al.  Online Inference of Topics with Latent Dirichlet Allocation , 2009, AISTATS.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Kimitoshi Yamazaki,et al.  Object shape reconstruction and pose estimation by a camera mounted on a mobile robot , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[13]  Tomoaki Nakamura,et al.  Real-time 3D visual sensor for robust object recognition , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Arindam Banerjee,et al.  Topic Models over Text Streams: A Study of Batch and Online Unsupervised Learning , 2007, SDM.

[15]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[16]  Tomoaki Nakamura,et al.  Grounding of word meanings in multimodal concepts using LDA , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Olivier Stasse,et al.  Towards autonomous object reconstruction for visual search by the humanoid robot HRP-2 , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.