Multimodal Question Answering over Structured Data with Ambiguous Entities

In recent years, we have witnessed profound changes in the way people satisfy their information needs. For instance, with the ubiquitous 24/7 availability of mobile devices, the number of search engine queries on mobile devices has reportedly overtaken that of queries on regular personal computers. In this paper, we consider the task of multimodal question answering over structured data, in which a user supplies not just a natural language query but also an image. Our system addresses this by optimizing a non-convex objective function capturing multimodal constraints. Our experiments show that this enables it to answer even very challenging ambiguous entity queries with high accuracy.

[1]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[2]  Wei Wu,et al.  ShapeLearner: Towards Shape-Based Visual Knowledge Harvesting , 2016, ECAI.

[3]  Liqing Zhang,et al.  MindFinder: interactive sketch-based image search on millions of images , 2010, ACM Multimedia.

[4]  Sébastien Ferré squall2sparql: a Translator from Controlled English to Full SPARQL 1.1 , 2013, CLEF.

[5]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[6]  Otthein Herzog,et al.  Text Understanding in LILOG , 1991, Lecture Notes in Computer Science.

[7]  Jun Zhao,et al.  Question Answering over Linked Data Using First-order Logic , 2014, EMNLP.

[8]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[9]  Gerard de Melo,et al.  "Seeing is believing: the quest for multimodal knowledge" by Gerard de Melo and Niket Tandon, with Martin Vesely as coordinator , 2016, LINK.

[10]  Berthold Crysmann,et al.  Question answering from structured knowledge sources , 2007, J. Appl. Log..

[11]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[12]  H. V. Jagadish,et al.  NaLIX: A generic natural language search environment for XML data , 2007, TODS.

[13]  Gerhard Weikum,et al.  MENTA: inducing multilingual taxonomies from wikipedia , 2010, CIKM '10.

[14]  Alexander Yates,et al.  Large-scale Semantic Parsing via Schema Matching and Lexicon Extension , 2013, ACL.

[15]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[16]  Bert F. Green,et al.  Baseball: an automatic question-answerer , 1899, IRE-AIEE-ACM '61 (Western).

[17]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.

[18]  Gerhard Weikum,et al.  Knowlywood: Mining Activity Knowledge From Hollywood Narratives , 2015, CIKM.

[19]  Yi Yang,et al.  Concepts Not Alone: Exploring Pairwise Relationships for Zero-Shot Video Activity Recognition , 2016, AAAI.

[20]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[21]  Gerhard Weikum,et al.  Natural Language Questions for the Web of Data , 2012, EMNLP.

[22]  Jimmy J. Lin,et al.  Overview of the TREC 2007 Question Answering Track , 2008, TREC.

[23]  Gerard de Melo,et al.  Searching the Web of Data , 2013, ECIR.

[24]  Shi-Min Hu,et al.  Sketch2Photo: internet image montage , 2009, ACM Trans. Graph..

[25]  Dongyan Zhao,et al.  Natural language question answering over RDF: a graph data driven approach , 2014, SIGMOD Conference.

[26]  Gerard de Melo,et al.  Perceptually Grounded Selectional Preferences , 2015, ACL.

[27]  Elena Cabrio,et al.  Multilingual Question Answering over Linked Data (QALD-3): Lab Overview , 2013, CLEF.

[28]  Otthein Herzog,et al.  Text Understanding in Lilog: Integrating Computational Linguistics and Artificial Intelligence Final Report on the IBM Germany Lilog-Project , 1991 .

[29]  Gerhard Weikum,et al.  YAGO2: exploring and querying world knowledge in time, space, context, and many languages , 2011, WWW.

[30]  Gerard de Melo,et al.  FrameBase: Representing N-Ary Relations Using Semantic Frames , 2015, ESWC.

[31]  Philipp Cimiano,et al.  Pythia: Compositional Meaning Construction for Ontology-Based Question Answering on the Semantic Web , 2011, NLDB.

[32]  Terry Winograd,et al.  Understanding natural language , 1974 .

[33]  Christopher Ré,et al.  Building a Large-scale Multimodal Knowledge Base for Visual Question Answering , 2015, ArXiv.

[34]  Gerard de Melo,et al.  Visualizing and Curating Knowledge Graphs over Time and Space , 2016, ACL.

[35]  Gerhard Weikum,et al.  AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables , 2011, Proc. VLDB Endow..

[36]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Zhiping Zheng,et al.  AnswerBus question answering system , 2002 .