FeCCM for scene understanding: Helping the robot to learn multiple tasks

Helping a robot to understand a scene can include many sub-tasks, such as scene categorization, object detection, geometric labeling, etc. Each sub-task is notoriously hard, and state-of-art classifiers exist for many sub-tasks. It is desirable to have an algorithm that can capture such correlation without requiring to make any changes to the inner workings of any classifier, and therefore make the perception for a robot better. We have recently proposed a generic model (Feedback Enabled Cascaded Classification Model) that enables us to easily take state-of-art classifiers as black-boxes and improve performance. In this video, we show that we can use our FeCCM model to quickly combine existing classifiers for various sub-tasks, and build a shoe finder robot in a day. The video shows our robot using FeCCM to find a shoe on request.

[1]  Lawson L. S. Wong,et al.  A Vision-Based System for Grasping Novel Objects in Cluttered Environments , 2007, ISRR.

[2]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ashutosh Saxena,et al.  Cascaded Classification Models: Combining Models for Holistic Scene Understanding , 2008, NIPS.

[4]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[5]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Tsuhan Chen,et al.  Toward Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[9]  Tsuhan Chen,et al.  A Generic Model to Compose Vision Modules for Holistic Scene Understanding , 2010, ECCV Workshops.