Exploratory Relation Extraction in Large Text Corpora

In this paper, we propose and demonstrate Exploratory Relation Extraction (ERE), a novel approach to identifying and extracting relations from large text corpora based on user-driven and data-guided incremental exploration. We draw upon ideas from the information seeking paradigm of Exploratory Search (ES) to enable an exploration process in which users begin with a vaguely defined information need and progressively sharpen their definition of extraction tasks as they identify relations of interest in the underlying data. This process extends the application of Relation Extraction to use cases characterized by imprecise information needs and uncertainty regarding the information content of available data. We present an interactive workflow that allows users to build extractors based on entity types and human-readable extraction patterns derived from subtrees in dependency trees. In order to evaluate the viability of our approach on large text corpora, we conduct experiments on a dataset of over 160 million sentences with mentions of over 6 million FREEBASE entities extracted from the CLUEWEB09 corpus. Our experiments indicate that even non-expert users can intuitively use our approach to identify relations and create high precision extractors with minimal effort.

[1]  Gary Marchionini,et al.  Exploratory search , 2006, Commun. ACM.

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[4]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[5]  Frederick Reiss,et al.  Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! , 2013, EMNLP.

[6]  Frederick Reiss,et al.  An Algebraic Approach to Rule-Based Information Extraction , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Satoshi Sekine,et al.  Preemptive Information Extraction using Unrestricted Relation Discovery , 2006, NAACL.

[8]  Ryen W. White,et al.  Exploratory Search: Beyond the Query-Response Paradigm , 2009, Exploratory Search: Beyond the Query-Response Paradigm.

[9]  Andrew McCallum,et al.  Transition-based Dependency Parsing with Selectional Branching , 2013, ACL.

[10]  Hans Uszkoreit Learning Relation Extraction Grammars with Minimal Human Intervention: Strategy, Results, Insights and Plans , 2011, CICLing.

[11]  Oren Etzioni,et al.  TextRunner: Open Information Extraction on the Web , 2007, NAACL.

[12]  Alan Akbik,et al.  Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees , 2013, ACL.

[13]  Michael Gertz,et al.  HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions , 2010, *SEMEVAL.

[14]  Frederick Reiss,et al.  Profile Extractor Test Extractor Develop Extractor Input Documents Label Text / Clues Task Analysis Rule Development Performance Tuning Delivery Export Extractor , 2012 .

[15]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[16]  Alexander Löser,et al.  Effective Selectional Restrictions for Unsupervised Relation Extraction , 2013, IJCNLP.

[17]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[18]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[19]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[20]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[21]  Luciano Del Corro,et al.  ClausIE: clause-based open information extraction , 2013, WWW.

[22]  Paul Buitelaar,et al.  RelExt: A Tool for Relation Extraction from Text in Ontology Extension , 2005, SEMWEB.

[23]  Frederick Reiss,et al.  SystemT: An Algebraic Approach to Declarative Information Extraction , 2010, ACL.

[24]  Slav Petrov,et al.  Overview of the 2012 Shared Task on Parsing the Web , 2012 .

[25]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[26]  Vivian Chu,et al.  Facilitating pattern discovery for relation extraction with semantic-signature-based clustering , 2011, CIKM '11.

[27]  Chang Wang,et al.  Relation extraction and scoring in DeepQA , 2012, IBM J. Res. Dev..