Open Information Extraction

Open Information Extraction (Open IE) systems aim to obtain relation tuples with highly scalable extraction in portable across domain by identifying a variety of relation phrases and their arguments in arbitrary sentences. The first generation of Open IE learns linear chain models based on unlexicalized features such as Part-of-Speech (POS) or shallow tags to label the intermediate words between pair of potential arguments for identifying extractable relations. Open IE currently is developed in the second generation that is able to extract instances of the most frequently observed relation types such as Verb, Noun and Prep, Verb and Prep, and Infinitive with deep linguistic analysis. They expose simple yet principled ways in which verbs express relationships in linguistics such as verb phrase-based extraction or clause-based extraction. They obtain a significantly higher performance over previous systems in the first generation. In this paper, we describe an overview of two Open IE generations including strengths, weaknesses and application areas.

[1]  Mark Steedman,et al.  Large-scale Semantic Parsing without Question-Answer Pairs , 2014, TACL.

[2]  Suzanne Stevenson,et al.  Statistical Measures of the Semi-Productivity of Light Verb Constructions , 2004 .

[3]  Xuchen Yao,et al.  Information Extraction over Structured Data: Question Answering with Freebase , 2014, ACL.

[4]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[5]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[6]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[7]  Bo Zhang,et al.  StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[8]  Luciano Del Corro,et al.  ClausIE: clause-based open information extraction , 2013, WWW.

[9]  D. Allerton Stretched Verb Constructions in English , 2001 .

[10]  Guodong Zhou,et al.  Tree kernel-based semantic relation extraction with rich syntactic and semantic information , 2010, Inf. Sci..

[11]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[12]  Ido Dagan,et al.  Open IE as an Intermediate Structure for Semantic Tasks , 2015, ACL.

[13]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[14]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[15]  Hannah Bast,et al.  More Informative Open Information Extraction via Simple Inference , 2014, ECIR.

[16]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[17]  Oren Etzioni,et al.  Towards Coherent Multi-Document Summarization , 2013, NAACL.

[18]  Oren Etzioni,et al.  An analysis of open information extraction based on semantic role labeling , 2011, K-CAP '11.

[19]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[20]  Mausam,et al.  Hierarchical Summarization: Scaling Up Multi-Document Summarization , 2014, ACL.

[21]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[22]  Oren Etzioni,et al.  Open Information Extraction to KBP Relations in 3 Hours , 2013, TAC.

[23]  Oren Etzioni,et al.  Unsupervised Resolution of Objects and Relations on the Web , 2007, NAACL.