Towards Interactive Object Recognition

I. INTRODUCTION Object recognition is a key component of service robots for finding and handling objects. Current state-of-the-art object recognition systems recognize objects based on static images [7, 8]. However, these systems prove limited in cases when objects are in ambiguous orientations or distinctive features are hidden, e.g., due to the pose of the object. A popular approach to tackle this problem is active perception [1, 3], where the robot intelligently moves its camera to reveal more information about the scene. However, there are cases where this approach will fail because distinctive features are hidden, for example, on the bottom side of the object (see Fig. 1). These cases are particularly common in cluttered environments, where features might be occluded not only due to the pose of the object but also by other items in the scene. It has been recently studied in the area of interactive perception that interacting with the scene exposes new possibilities to tackle common perception problems. This paper addresses both challenges—selecting an object of a cluttered scene for manipulation and picking the optimal movement of this object—in an information-theoretic way to improve interactive perception methods. Interacting with a scene to improve perception by revealing informative surfaces has been particularly explored in the area of segmentation. Examples are: interactive segmentation of rigid objects being moved by a robot [5], segmentation of articulated objects [4], and disambiguation of segmentation hypothesis [2]. However, none of these approaches reason about what actions to take in order to achieve the goal. In this work we introduce a probabilistic method for choosing object manipulation actions to optimally reveal information about objects in a scene based on robot's observations. To the best of our knowledge, the problem of interactive object recognition has not been addressed before. Our approach determines the optimal action for a robot to interact with objects and adjust their pose to reveal discriminative features for determining their identity. In the ambiguous book example (see Fig. 1), this means flipping the book over and observing the cover, which results in more confident recognition. Our method is based on a probabilistic graphical model for feature-based object and pose recognition. By inferring posterior distributions of object probabilities conditioned on all previous actions and observations, our approach enables a robot to select the optimal action to reduce the uncertainty of the object. The key contributions of this approach are: (a) it presents

[1]  Fahad Shahbaz Khan,et al.  Fusing Color and Shape for Bag-of-Words Based Object Recognition , 2013, CCIW.

[2]  Oliver Brock,et al.  Interactive segmentation for manipulation in unstructured environments , 2009, 2009 IEEE International Conference on Robotics and Automation.

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  Geoffrey A. Hollinger,et al.  Active Classification: Theory and Application to Underwater Inspection , 2011, ISRR.

[5]  George J. Pappas,et al.  Hypothesis testing framework for active object detection , 2013, 2013 IEEE International Conference on Robotics and Automation.

[6]  Pieter Abbeel,et al.  A textured object recognition pipeline for color and depth image data , 2012, 2012 IEEE International Conference on Robotics and Automation.