论文信息 - Reasoning about Object Affordances in a Knowledge Base Representation

Reasoning about Object Affordances in a Knowledge Base Representation

Reasoning about objects and their affordances is a fundamental problem for visual intelligence. Most of the previous work casts this problem as a classification task where separate classifiers are trained to label objects, recognize attributes, or assign affordances. In this work, we consider the problem of object affordance reasoning using a knowledge base representation. Diverse information of objects are first harvested from images and other meta-data sources. We then learn a knowledge base (KB) using a Markov Logic Network (MLN). Given the learned KB, we show that a diverse set of visual inference tasks can be done in this unified framework without training separate classifiers, including zero-shot affordance prediction and object recognition given human poses.

[1] J. Gibson. The Ecological Approach to Visual Perception , 1979 .

[2] Michael R. Lowry,et al. Learning Physical Descriptions From Functional Definitions, Examples, and Precedents , 1983, AAAI.

[3] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[4] Michael Fink,et al. Object Classification from a Single Example Utilizing Class Relevance Metrics , 2004, NIPS.

[5] Michael Fink. Object Classication from a Single Example Utilizing Class Relevance Pseudo-Metrics , 2004, NIPS 2004.

[6] Pietro Perona,et al. Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7] Shimon Ullman,et al. Single-example Learning of Novel Classes using Representation by Similarity , 2005, BMVC.

[8] Matthew Richardson,et al. Markov logic networks , 2006, Machine Learning.

[9] Andrew J. Davison,et al. Active Matching , 2008, ECCV.

[10] Praveen Paritosh,et al. Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[11] David A. McAllester,et al. A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Pedro M. Domingos,et al. Lifted First-Order Belief Propagation , 2008, AAAI.

[13] Larry S. Davis,et al. Event Modeling and Recognition Using Markov Logic Networks , 2008, ECCV.

[14] Geoffrey J. Gordon,et al. Relational learning via collective matrix factorization , 2008, KDD.

[15] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[17] Larry S. Davis,et al. Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Bernt Schiele,et al. Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Bo Zhang,et al. StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[20] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Jennifer Chu-Carroll,et al. Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[22] Estevam R. Hruschka,et al. Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[23] Alexei A. Efros,et al. From 3D scene geometry to human workspace , 2011, CVPR 2011.

[24] Luc Van Gool,et al. What makes a chair a chair? , 2011, CVPR 2011.

[25] Jason Weston,et al. Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[26] Kristen Grauman,et al. Relative attributes , 2011, 2011 International Conference on Computer Vision.

[27] Leonidas J. Guibas,et al. Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[28] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[29] Bernt Schiele,et al. Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[30] Danica Kragic,et al. Visual object-action recognition: Inferring object affordances from human demonstration , 2011, Comput. Vis. Image Underst..

[31] Matthieu Guillaumin,et al. Segmentation Propagation in ImageNet , 2012, ECCV.

[32] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33] Christopher Ré,et al. Elementary: Large-Scale Knowledge-Base Construction via Machine Learning and Statistical Inference , 2012, Int. J. Semantic Web Inf. Syst..

[34] Jonathan Krause,et al. Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Fei-Fei Li,et al. Discovering Object Functionality , 2013, 2013 IEEE International Conference on Computer Vision.

[36] Danqi Chen,et al. Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[37] Xinlei Chen,et al. NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[38] Yun Jiang,et al. Hallucinated Humans as the Hidden Context for Labeling 3D Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39] Alexei A. Efros,et al. People Watching: Human Actions as a Cue for Single View Geometry , 2012, International Journal of Computer Vision.

[40] Hema Swetha Koppula,et al. Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.