What makes a chair a chair?

Many object classes are primarily defined by their functions. However, this fact has been left largely unexploited by visual object categorization or detection systems. We propose a method to learn an affordance detector. It identifies locations in the 3d space which “support” the particular function. Our novel approach “imagines” an actor performing an action typical for the target object class, instead of relying purely on the visual object appearance. So, function is handled as a cue complementary to appearance, rather than being a consideration after appearance-based detection. Experimental results are given for the functional category “sitting”. Such affordance is tested on a 3d representation of the scene, as can be realistically obtained through SfM or depth cameras. In contrast to appearance-based object detectors, affordance detection requires only very few training examples and generalizes very well to other sittable objects like benches or sofas when trained on a few chairs.

[1]  K. Nelson Concept, word, and sentence: Interrelations in acquisition and development. , 1974 .

[2]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[3]  L. Stark,et al.  Dissertation Abstract , 1994, Journal of Cognitive Education and Psychology.

[4]  G. Reeke Marvin Minsky, The Society of Mind , 1991, Artif. Intell..

[5]  Laura A. Carlson-Radvansky,et al.  “What” Effects on “Where”: Functional Influences on Spatial Relations , 1999 .

[6]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[7]  S. Waxman,et al.  Object names and object functions serve as cues to categories for infants. , 2002, Developmental psychology.

[8]  Heinrich H. Bülthoff,et al.  Image-based recognition of biological motion, scenes and objects , 2003 .

[9]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[10]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Svetha Venkatesh,et al.  Combining image regions and human activity for indirect object recognition in indoor wide-angle views , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[12]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[13]  Jakob Andreas Bærentzen,et al.  3D distance fields: a survey of techniques and applications , 2006, IEEE Transactions on Visualization and Computer Graphics.

[14]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[15]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Horst Bischof,et al.  A Globally Optimal Algorithm for Robust TV-L1 Range Image Integration , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Larry S. Davis,et al.  Objects in Action: An Approach for Combining Action Understanding and Object Perception , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Bernt Schiele,et al.  Functional Object Class Detection Based on Learned Affordance Cues , 2008, ICVS.

[19]  Lisa M Oakes,et al.  Function revisited: how infants construe functional features in their representation of objects. , 2008, Advances in child development and behavior.

[20]  Kostas Daniilidis,et al.  Object Detection from Large-Scale 3D Datasets Using Bottom-Up and Top-Down Descriptors , 2008, ECCV.

[21]  Hans-Peter Seidel,et al.  A Statistical Model of Human Pose and Body Shape , 2009, Comput. Graph. Forum.

[22]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[23]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Luc Van Gool,et al.  Hough Transform and 3D SURF for Robust Three Dimensional Classification , 2010, ECCV.

[25]  Eren Erdal Aksoy,et al.  Categorizing object-action relations from semantic scene graphs , 2010, 2010 IEEE International Conference on Robotics and Automation.

[26]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Danica Kragic,et al.  Visual object-action recognition: Inferring object affordances from human demonstration , 2011, Comput. Vis. Image Underst..