Robots move: Bootstrapping the development of object representations using sensorimotor coordination

This paper is concerned with the unsupervised learning of object representations by fusing visual and motor information. The problem is posed for a mobile robot that develops its representations as it incrementally gathers data. The scenario is problematic as the robot only has limited information at each time step with which it must generate and update its representations. Object representations are refined as multiple instances of sensory data are presented; however, it is uncertain whether two data instances are synonymous with the same object. This process can easily diverge from stability. The premise of the presented work is that a robot's motor information instigates successful generation of visual representations. An understanding of self-motion enables a prediction to be made before performing an action, resulting in a stronger belief of data association. The system is implemented as a data-driven partially observable semi-Markov decision process. Object representations are formed as the process's hidden states and are coordinated with motor commands through state transitions. Experiments show the prediction process is essential in enabling the unsupervised learning method to converge to a solution - improving precision and recall over using sensory data alone.

[1]  Andrew McCallum,et al.  Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.

[2]  Benjamin Kuipers,et al.  The initial development of object knowledge by a learning robot , 2008, Robotics Auton. Syst..

[3]  Niklas Bergström,et al.  Generating object hypotheses in natural scenes through human-robot interaction , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Li Fei-Fei,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Lucas Paletta,et al.  Active object recognition by view integration and reinforcement learning , 2000, Robotics Auton. Syst..

[6]  Bir Bhanu,et al.  Closed-Loop Object Recognition Using Reinforcement Learning , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Justus H. Piater,et al.  Closed-Loop Learning of Visual Control Policies , 2011, J. Artif. Intell. Res..

[8]  Paul Newman,et al.  FAB-MAP 3D: Topological mapping with spatial and visual appearance , 2010, 2010 IEEE International Conference on Robotics and Automation.

[9]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[10]  Michael Milford Visual Route Recognition with a Handful of Bits , 2012, Robotics: Science and Systems.

[11]  S. Chitta,et al.  Perception , Planning , and Execution for Mobile Manipulation in Unstructured Environments , 2012 .

[12]  Trevor Darrell,et al.  Practical 3-D Object detection using category and instance-level appearance models , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[14]  Ron Alterovitz,et al.  Motion Planning Under Uncertainty Using Differential Dynamic Programming in Belief Space , 2011, ISRR.

[15]  Gordon Wyeth,et al.  Emergent task-specific object semantics through distributed experience networks , 2011, IROS 2011.

[16]  Gordon Wyeth,et al.  CAT-SLAM: probabilistic localisation and mapping using a continuous appearance-based trajectory , 2012, Int. J. Robotics Res..

[17]  Matei T. Ciocarlie,et al.  Mobile Manipulation in Unstructured Environments: Perception, Planning, and Execution , 2012, IEEE Robotics & Automation Magazine.

[18]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[19]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.