Open Information Extraction from Conjunctive Sentences

We develop CALM, a coordination analyzer that improves upon the conjuncts identified from dependency parses. It uses a language model based scoring and several linguistic constraints to search over hierarchical conjunct boundaries (for nested coordination). By splitting a conjunctive sentence around these conjuncts, CALM outputs several simple sentences. We demonstrate the value of our coordination analyzer in the end task of Open Information Extraction (Open IE). State-of-the-art Open IE systems lose substantial yield due to ineffective processing of conjunctive sentences. Our Open IE system, CALMIE, performs extraction over the simple sentences identified by CALM to obtain up to 1.8x yield with a moderate increase in precision compared to extractions from original sentences.

[1]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[2]  Masashi Shimbo,et al.  A Discriminative Learning Model for Coordinate Conjunctions , 2007, EMNLP.

[3]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[4]  Deirdre Hogan,et al.  Coordinate Noun Phrase Disambiguation in a Generative Parsing Model , 2007, ACL.

[5]  Doug Downey,et al.  Locating Complex Named Entities in Web Text , 2007, IJCAI.

[6]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[7]  Daphne Koller,et al.  Sentence Simplification for Semantic Role Labeling , 2008, ACL.

[8]  Yuji Matsumoto,et al.  Coordinate Structure Analysis with Global Structural Constraints and Alignment-Based Local Features , 2009, ACL/IJCNLP.

[9]  Iryna Gurevych,et al.  A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[10]  Jun'ichi Tsujii,et al.  Entity-Focused Sentence Simplification for Relation Extraction , 2010, COLING.

[11]  Oren Etzioni,et al.  An analysis of open information extraction based on semantic role labeling , 2011, K-CAP '11.

[12]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[13]  Jun'ichi Tsujii,et al.  Coordination Structure Analysis using Dual Decomposition , 2012, EACL.

[14]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[15]  Luciano Del Corro,et al.  ClausIE: clause-based open information extraction , 2013, WWW.

[16]  Denilson Barbosa,et al.  Effectiveness and Efficiency of Open Relation Extraction , 2013, EMNLP.

[17]  Denilson Barbosa,et al.  Open Information Extraction with Tree Kernels , 2013, NAACL.

[18]  Elmar Haussmann,et al.  Open Information Extraction via Contextual Sentence Decomposition , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[19]  Denilson Barbosa,et al.  Improving Open Relation Extraction via Sentence Re-Structuring , 2014, LREC.

[20]  Mausam,et al.  Hierarchical Summarization: Scaling Up Multi-Document Summarization , 2014, ACL.

[21]  Hannah Bast,et al.  More Informative Open Information Extraction via Simple Inference , 2014, ECIR.

[22]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[23]  Ido Dagan,et al.  Open IE as an Intermediate Structure for Semantic Tasks , 2015, ACL.

[24]  Harinder Pal,et al.  Demonyms and Compound Relational Nouns in Nominal Open IE , 2016, AKBC@NAACL-HLT.

[25]  Dragomir R. Radev,et al.  Nested Propositions in Open Information Extraction , 2016, EMNLP.

[26]  Mausam,et al.  Open Information Extraction Systems and Downstream Applications , 2016, IJCAI.

[27]  Ido Dagan,et al.  Creating a Large Benchmark for Open Information Extraction , 2016, EMNLP.

[28]  Ido Dagan,et al.  Specifying and Annotating Reduced Argument Span Via QA-SRL , 2016, ACL.

[29]  Yoav Goldberg,et al.  A Neural Network for Coordination Boundary Prediction , 2016, EMNLP.

[30]  Mausam,et al.  Knowledge-Guided Linguistic Rewrites for Inference Rule Verification , 2016, NAACL.

[31]  Harinder Pal,et al.  Bootstrapping for Numerical Open IE , 2017, ACL.