Towards using multiple cues for robust object recognition

A robot's ability to assist humans in a variety of tasks, e.g. in search and rescue or in a household, heavily depends on the robot's reliable recognition of the objects in the environment. Numerous approaches attempt to recognize objects based only on the robot's vision. However, the same type of object can have very different visual appearances, such as shape, size, pose, and color. Although such approaches are widely studied with relative success, the general object recognition task still remains very challenging. We build our work upon the fact that robots can observe humans interacting with the objects in their environment, and thus providing numerous non-visual cues to those objects' identities. We research on a flexible object recognition approach which can use any multiple cues, whether they are visual cues intrinsic to the object or provided by observation of a human. We realize the challenging issue that multiple cues can have different weight in their association with an object definition and need to be taken into account during recognition. In this paper, we contribute a probabilistic relational representation of the cue weights and an object recognition algorithm that can flexibly combine multiple cues of any type to robustly recognize objects. We show illustrative results of our implemented approach using visual, activity, gesture, and speech cues, provided by machine or human, to recognize objects more robustly than when using only a single cue.

[1]  Irfan A. Essa,et al.  Exploiting human actions and object context for recognition tasks , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Matthai Philipose,et al.  Unsupervised Activity Recognition Using Automatically Mined Common Sense , 2005, AAAI.

[4]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[5]  Manuela M. Veloso,et al.  FOCUS: a generalized method for object discovery for robots that observe and interact with humans , 2006, HRI '06.

[6]  Jochen Triesch,et al.  Shared Features for Scalable Appearance-Based Object Recognition , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[7]  Alex Pentland,et al.  Probabilistic visual learning for object detection , 1995, Proceedings of IEEE International Conference on Computer Vision.

[8]  Rohini K. Srihari,et al.  Computational models for integrating linguistic and visual information: A survey , 2004, Artificial Intelligence Review.

[9]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[10]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.