Stating the Obvious: Extracting Visual Common Sense Knowledge

Obtaining common sense knowledge using current information extraction techniques is extremely challenging. In this work, we instead propose to derive simple common sense statements from fully annotated object detection corpora such as the Microsoft Common Objects in Context dataset. We show that many thousands of common sense facts can be extracted from such corpora at high quality. Furthermore, using WordNet and a novel submodular k-coverage formulation, we are able to generalize our initial set of common sense assertions to unseen objects and uncover over 400k potentially useful facts.

[1]  Ramanathan V. Guha,et al.  Cyc: toward programs with common sense , 1990, CACM.

[2]  Anthony G. Cohn,et al.  A Spatial Logic based on Regions and Connection , 1992, KR.

[3]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[4]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[5]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[6]  Lucy Vanderwende Volunteers Created the Web , 2005, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[7]  Ido Dagan,et al.  Recognizing textual entailment: Rational, evaluation and approaches , 2009, Natural Language Engineering.

[8]  Ido Dagan,et al.  Recognizing textual entailment: Rational, evaluation and approaches – Erratum , 2010, Natural Language Engineering.

[9]  Oren Etzioni,et al.  Learning First-Order Horn Clauses from Web Text , 2010, EMNLP.

[10]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[11]  ChengXiang Zhai,et al.  Tapping into knowledge base for concept feedback: leveraging conceptnet to improve search results for difficult queries , 2012, WSDM '12.

[12]  Jonathan Krause,et al.  Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Catherine Havasi,et al.  ConceptNet 5: A Large Semantic Network for Relational Knowledge , 2013, The People's Web Meets NLP.

[14]  Yejin Choi,et al.  From Large Scale Image Categorization to Entry-Level Categories , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Xinlei Chen,et al.  NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Jian-Yun Nie,et al.  Diversified query expansion using conceptnet , 2013, CIKM.

[17]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[18]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[19]  Ali Farhadi,et al.  Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Frank Keller,et al.  Query-by-Example Image Retrieval using Visual Dependency Representations , 2014, COLING.

[21]  Li Fei-Fei,et al.  Reasoning about Object Affordances in a Knowledge Base Representation , 2014, ECCV.

[22]  Wei Liu,et al.  Predicting Entry-Level Categories , 2015, International Journal of Computer Vision.

[23]  Desmond Elliott,et al.  Describing Images using Inferred Visual Dependency Representations , 2015, ACL.

[24]  Ali Farhadi,et al.  VisKE: Visual knowledge extraction and question answering by visual verification of relation phrases , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.