IKE - An Interactive Tool for Knowledge Extraction

Recent work on information extraction has suggested that fast, interactive tools can be highly effective; however, creating a usable system is challenging, and few publically available tools exist. In this paper we present IKE, a new extraction tool that performs fast, interactive bootstrapping to develop high-quality extraction patterns for targeted relations. Central to IKE is the notion that an extraction pattern can be treated as a search query over a corpus. To operationalize this, IKE uses a novel query language that is expressive, easy to understand, and fast to execute essential requirements for a practical system. It is also the first interactive extraction tool to seamlessly integrate symbolic (boolean) and distributional (similarity-based) methods for search. An initial evaluation suggests that relation tables can be populated substantially faster than by manual pattern authoring while retaining accuracy, and more reliably than fully automated tools, an important step towards practical KB construction. We are making IKE publically available (http://allenai.org/ software/interactive-knowledge-extraction).

[1]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[2]  Clinton Gormley,et al.  Elasticsearch: The Definitive Guide , 2015 .

[3]  Pablo Gamallo,et al.  Dependency-Based Open Information Extraction , 2012 .

[4]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[5]  Doug Downey,et al.  KnowItNow: Fast, Scalable Information Extraction from the Web , 2005, HLT.

[6]  Maya Cakmak,et al.  Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Tom M. Mitchell,et al.  Coupling Semi-Supervised Learning of Categories and Relations , 2009, HLT-NAACL 2009.

[9]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[10]  David Maxwell Chickering,et al.  ModelTracker: Redesigning Performance Analysis Tools for Machine Learning , 2015, CHI.

[11]  Oren Etzioni,et al.  Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions , 2016, AAAI.

[12]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[13]  Ryan Gabbard,et al.  Extreme Extraction – Machine Reading in a Week , 2011, EMNLP.

[14]  Michael Collins,et al.  Learning Dictionaries for Named Entity Recognition using Minimal Supervision , 2014, EACL.

[15]  Alan Akbik,et al.  Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees , 2013, ACL.

[16]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[17]  Gerhard Weikum,et al.  Scalable knowledge harvesting with high precision and high recall , 2011, WSDM '11.

[18]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[19]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[20]  Christopher D. Manning,et al.  Improved Pattern Learning for Bootstrapped Entity Extraction , 2014, CoNLL.

[21]  Luke S. Zettlemoyer,et al.  Extreme Extraction: Only One Hour per Relation , 2015, ArXiv.

[22]  Oren Etzioni,et al.  Open Information Extraction to KBP Relations in 3 Hours , 2013, TAC.

[23]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[24]  Frederick Reiss,et al.  Profile Extractor Test Extractor Develop Extractor Input Documents Label Text / Clues Task Analysis Rule Development Performance Tuning Delivery Export Extractor , 2012 .