Creating a Large Benchmark for Open Information Extraction

Open information extraction (Open IE) was presented as an unrestricted variant of traditional information extraction. It has been gaining substantial attention, manifested by a large number of automatic Open IE extractors and downstream applications. In spite of this broad attention, the Open IE task definition has been lacking – there are no formal guidelines and no large scale gold standard annotation. Subsequently, the various implementations of Open IE resorted to small scale posthoc evaluations, inhibiting an objective and reproducible cross-system comparison. In this work, we develop a methodology that leverages the recent QA-SRL annotation to create a first independent and large scale Open IE annotation,1 and use it to automatically compare the most prominent Open IE systems.

[1]  Ido Dagan,et al.  Open IE as an Intermediate Structure for Semantic Tasks , 2015, ACL.

[2]  Luke S. Zettlemoyer,et al.  Question-Answer Driven Semantic Role Labeling: Using Natural Language to Annotate Natural Language , 2015, EMNLP.

[3]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[4]  Ido Dagan,et al.  Specifying and Annotating Reduced Argument Span Via QA-SRL , 2016, ACL.

[5]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[6]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[7]  Alexander Löser,et al.  KrakeN: N-ary Facts in Open Information Extraction , 2012, AKBC-WEKEX@NAACL-HLT.

[8]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[9]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[10]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[11]  Oren Etzioni,et al.  Generating Coherent Event Schemas at Scale , 2013, EMNLP.

[12]  Luciano Del Corro,et al.  ClausIE: clause-based open information extraction , 2013, WWW.

[13]  Ido Dagan,et al.  Getting More Out Of Syntax with PropS , 2016, ArXiv.

[14]  Oren Etzioni,et al.  Towards Coherent Multi-Document Summarization , 2013, NAACL.

[15]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[16]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[17]  Daniel S. Weld,et al.  Information extraction from Wikipedia: moving down the long tail , 2008, KDD.