SPIED: Stanford Pattern based Information Extraction and Diagnostics

This paper aims to provide an effective interface for progressive refinement of pattern-based information extraction systems. Pattern-based information extraction (IE) systems have an advantage over machine learning based systems that patterns are easy to customize to cope with errors and are interpretable by humans. Building a pattern-based system is usually an iterative process of trying different parameters and thresholds to learn patterns and entities with high precision and recall. Since patterns are interpretable to humans, it is possible to identify sources of errors, such as patterns responsible for extracting incorrect entities and vice-versa, and correct them. However, it involves time consuming manual inspection of the extracted output. We present a light-weight tool, SPIED, to aid IE system developers in learning entities using patterns with bootstrapping, and visualizing the learned entities and patterns with explanations. SPIED is the first publicly available tool to visualize diagnostic information of multiple pattern learning systems to the best of our knowledge.

[1]  Frederick Reiss,et al.  Profile Extractor Test Extractor Develop Extractor Input Documents Label Text / Clues Task Analysis Rule Development Performance Tuning Delivery Export Extractor , 2012 .

[2]  Robert Rieger,et al.  Enabling information extraction by inference of regular expressions from sample entities , 2011, CIKM '11.

[3]  Alexiei Dingli,et al.  User-System Cooperation in Document Annotation Based on Information Extraction , 2002, EKAW.

[4]  Alan Akbik,et al.  Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees , 2013, ACL.

[5]  Jeffrey Heer,et al.  Research and applications: Induced lexico-syntactic patterns improve information extraction from online medical forums , 2014, J. Am. Medical Informatics Assoc..

[6]  Jun'ichi Tsujii,et al.  Corpus annotation for mining biomedical events from literature , 2008, BMC Bioinformatics.

[7]  Christopher D. Manning,et al.  Improved Pattern Learning for Bootstrapped Entity Extraction , 2014, CoNLL.

[8]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[9]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[10]  Ralph Grishman,et al.  Automatic Acquisition of Domain Knowledge for Information Extraction , 2000, COLING.

[11]  Ellen Riloff,et al.  A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[12]  Ralph Grishman,et al.  Bootstrapped Learning of Semantic Classes from Positive and Negative Examples , 2003 .

[13]  Vivian Chu,et al.  Facilitating pattern discovery for relation extraction with semantic-signature-based clustering , 2011, CIKM '11.

[14]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[15]  Maria Liakata,et al.  Semantic Annotation of Papers: Interface & Enrichment Tool (SAPIENT) , 2009, BioNLP@HLT-NAACL.

[16]  Frederick Reiss,et al.  Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! , 2013, EMNLP.

[17]  Ralph Grishman,et al.  Unsupervised Learning of Generalized Names , 2002, COLING.