Large-Scale Question Answering with Joint Embedding and Proof Tree Decoding

Question answering (QA) over a large-scale knowledge base (KB) such as Freebase is an important natural language processing application. There are linguistically oriented semantic parsing techniques and machine learning motivated statistical methods. Both of these approaches face a key challenge on how to handle diverse ways natural questions can be expressed about predicates and entities in the KB. This paper is to investigate how to combine these two approaches. We frame the problem from a proof-theoretic perspective, and formulate it as a proof tree search problem that seamlessly unifies semantic parsing, logic reasoning, and answer ranking. We combine our word entity joint embedding learned from web-scale data with other surface-form features to further boost accuracy improvements. Our real-time system on the Freebase QA task achieved a very high F1 score (47.2) on the standard Stanford WebQuestions benchmark test data.

[1]  Jason Weston,et al.  Question Answering with Subgraph Embeddings , 2014, EMNLP.

[2]  Eunsol Choi,et al.  Scaling Semantic Parsers with On-the-Fly Ontology Matching , 2013, EMNLP.

[3]  Tiejun Zhao,et al.  Knowledge-Based Question Answering as Machine Translation , 2014, ACL.

[4]  Oren Etzioni,et al.  Paraphrase-Driven Learning for Open Question Answering , 2013, ACL.

[5]  Mark Steedman,et al.  Robust Semantics for Semantic Parsing , 2014, PACLIC.

[6]  Alexander Yates,et al.  Large-scale Semantic Parsing via Schema Matching and Lexicon Extension , 2013, ACL.

[7]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[8]  Mark Steedman,et al.  Proceedings of the ACL 2014 Workshop on Semantic Parsing , 2014 .

[9]  Xuchen Yao,et al.  Information Extraction over Structured Data: Question Answering with Freebase , 2014, ACL.

[10]  J. Pierce An introduction to information theory: symbols, signals & noise , 1980 .

[11]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[12]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[13]  Xuedong Huang,et al.  An Overview of Microsoft Deep QA System on Stanford WebQuestions Benchmark , 2014 .

[14]  Marie-Francine Moens,et al.  A survey on question answering technology from an information retrieval perspective , 2011, Inf. Sci..

[15]  Stephen Wan,et al.  Using Dependency-Based Features to Take the ’Para-farce’ out of Paraphrase , 2006, ALTA.

[16]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[17]  Jonathan Berant,et al.  Semantic Parsing via Paraphrasing , 2014, ACL.

[18]  Jason Weston,et al.  Open Question Answering with Weakly Supervised Embedding Models , 2014, ECML/PKDD.

[19]  Zheng Chen,et al.  Web Information at Your Fingertips: Paper as an Interaction Metaphor , 2014, Computer.

[20]  James Baker,et al.  A historical perspective of speech recognition , 2014, CACM.

[21]  Xuchen Yao,et al.  Freebase QA: Information Extraction or Semantic Parsing? , 2014, ACL 2014.