Reasoning about Object Affordances in a Knowledge Base Representation

Reasoning about objects and their affordances is a fundamental problem for visual intelligence. Most of the previous work casts this problem as a classification task where separate classifiers are trained to label objects, recognize attributes, or assign affordances. In this work, we consider the problem of object affordance reasoning using a knowledge base representation. Diverse information of objects are first harvested from images and other meta-data sources. We then learn a knowledge base (KB) using a Markov Logic Network (MLN). Given the learned KB, we show that a diverse set of visual inference tasks can be done in this unified framework without training separate classifiers, including zero-shot affordance prediction and object recognition given human poses.

[1]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[2]  Michael R. Lowry,et al.  Learning Physical Descriptions From Functional Definitions, Examples, and Precedents , 1983, AAAI.

[3]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[4]  Michael Fink,et al.  Object Classification from a Single Example Utilizing Class Relevance Metrics , 2004, NIPS.

[5]  Michael Fink Object Classication from a Single Example Utilizing Class Relevance Pseudo-Metrics , 2004, NIPS 2004.

[6]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  Shimon Ullman,et al.  Single-example Learning of Novel Classes using Representation by Similarity , 2005, BMVC.

[8]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[9]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[10]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[11]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Pedro M. Domingos,et al.  Lifted First-Order Belief Propagation , 2008, AAAI.

[13]  Larry S. Davis,et al.  Event Modeling and Recognition Using Markov Logic Networks , 2008, ECCV.

[14]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[15]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[17]  Larry S. Davis,et al.  Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Bo Zhang,et al.  StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[20]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[22]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[23]  Alexei A. Efros,et al.  From 3D scene geometry to human workspace , 2011, CVPR 2011.

[24]  Luc Van Gool,et al.  What makes a chair a chair? , 2011, CVPR 2011.

[25]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[26]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[27]  Leonidas J. Guibas,et al.  Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[28]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[29]  Bernt Schiele,et al.  Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[30]  Danica Kragic,et al.  Visual object-action recognition: Inferring object affordances from human demonstration , 2011, Comput. Vis. Image Underst..

[31]  Matthieu Guillaumin,et al.  Segmentation Propagation in ImageNet , 2012, ECCV.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Christopher Ré,et al.  Elementary: Large-Scale Knowledge-Base Construction via Machine Learning and Statistical Inference , 2012, Int. J. Semantic Web Inf. Syst..

[34]  Jonathan Krause,et al.  Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Fei-Fei Li,et al.  Discovering Object Functionality , 2013, 2013 IEEE International Conference on Computer Vision.

[36]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[37]  Xinlei Chen,et al.  NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[38]  Yun Jiang,et al.  Hallucinated Humans as the Hidden Context for Labeling 3D Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Alexei A. Efros,et al.  People Watching: Human Actions as a Cue for Single View Geometry , 2012, International Journal of Computer Vision.

[40]  Hema Swetha Koppula,et al.  Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.