Open Information Extraction from Question-Answer Pairs

Open Information Extraction (OpenIE) extracts meaningful structured tuples from free-form text. Most previous work on OpenIE considers extracting data from one sentence at a time. We describe NeurON, a system for extracting tuples from question-answer pairs. Since real questions and answers often contain precisely the information that users care about, such information is particularly desirable to extend a knowledge base with. NeurON addresses several challenges. First, an answer text is often hard to understand without knowing the question, and second, relevant information can span multiple sentences. To address these, NeurON formulates extraction as a multi-source sequence-to-sequence learning task, wherein it combines distributed representations of a question and an answer to generate knowledge facts. We describe experiments on two real-world datasets that demonstrate that NeurON can find a significant number of new and interesting facts to extend a knowledge base compared to state-of-the-art OpenIE methods.

[1]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[2]  Peter Clark,et al.  Learning Knowledge Graphs for Question Answering through Conversational Dialog , 2015, NAACL.

[3]  Kevin Knight,et al.  Multi-Source Neural Translation , 2016, NAACL.

[4]  Dongyan Zhao,et al.  A Constrained Sequence-to-Sequence Neural Model for Sentence Simplification , 2017, ArXiv.

[5]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[7]  Ming Zhou,et al.  Neural Open Information Extraction , 2018, ACL.

[8]  Zhendong Mao,et al.  Knowledge Graph Embedding: A Survey of Approaches and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[9]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[10]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[11]  Eugene Agichtein,et al.  Relation Extraction from Community Generated Question-Answer Pairs , 2015, HLT-NAACL.

[12]  Dragomir R. Radev,et al.  Nested Propositions in Open Information Extraction , 2016, EMNLP.

[13]  Antonio Toral,et al.  A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions , 2017, EACL.

[14]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[15]  Deyi Xiong,et al.  Cseq2seq: Cyclic Sequence-to-Sequence Learning , 2016 .

[16]  Harinder Pal,et al.  Bootstrapping for Numerical Open IE , 2017, ACL.

[17]  Julian J. McAuley,et al.  Addressing Complex and Subjective Product-Related Queries with Customer Reviews , 2015, WWW.

[18]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[19]  Jiawei Han,et al.  Indirect Supervision for Relation Extraction using Question-Answer Pairs , 2017, WSDM.

[20]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[21]  Richard Socher,et al.  Dynamic Coattention Networks For Question Answering , 2016, ICLR.

[22]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[23]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[24]  P. Sreenivasa Kumar,et al.  Enriching domain ontologies using question-answer datasets , 2018, COMAD/CODS.

[25]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[26]  Ido Dagan,et al.  Creating a Large Benchmark for Open Information Extraction , 2016, EMNLP.

[27]  Iryna Gurevych,et al.  Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging , 2017, EMNLP.

[28]  Wang Mengting,et al.  Modeling Ambiguity, Subjectivity, and Diverging Viewpoints in Opinion Question Answering Systems , 2016 .

[29]  Alvin Cheung,et al.  Learning a Neural Semantic Parser from User Feedback , 2017, ACL.

[30]  Mausam,et al.  Mitigating the Effect of Out-of-Vocabulary Entity Pairs in Matrix Factorization for KB Inference , 2018, IJCAI.

[31]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[32]  Oren Etzioni,et al.  TextRunner: Open Information Extraction on the Web , 2007, NAACL.

[33]  Ido Dagan,et al.  Supervised Open Information Extraction , 2018, NAACL.

[34]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[35]  Claire Gardent,et al.  Sequence-based Structured Prediction for Semantic Parsing , 2016, ACL.

[36]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[37]  Percy Liang,et al.  Transforming Question Answering Datasets Into Natural Language Inference Datasets , 2018, ArXiv.

[38]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[39]  Lorenzo Rosasco,et al.  Holographic Embeddings of Knowledge Graphs , 2015, AAAI.