Improved Pattern Learning for Bootstrapped Entity Extraction

Bootstrapped pattern learning for entity extraction usually starts with seed entities and iteratively learns patterns and entities from unlabeled text. Patterns are scored by their ability to extract more positive entities and less negative entities. A problem is that due to the lack of labeled data, unlabeled entities are either assumed to be negative or are ignored by the existing pattern scoring measures. In this paper, we improve pattern scoring by predicting the labels of unlabeled entities. We use various unsupervised features based on contrasting domain-specific and general text, and exploiting distributional similarity and edit distances to learned entities. Our system outperforms existing pattern scoring algorithms for extracting drug-andtreatment entities from four medical forums.

[1]  Doug Downey,et al.  Learning text patterns for web information extraction and assessment , 2004, AAAI 2004.

[2]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[3]  Ramesh Nallapati,et al.  Legal Docket Classification: Where Machine Learning Stumbles , 2008, EMNLP.

[4]  Yoram Singer,et al.  A simple, fast, and effective rule learner , 1999, AAAI 1999.

[5]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[6]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[7]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[8]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[9]  Percy Liang,et al.  Semi-Supervised Learning for Natural Language , 2005 .

[10]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[11]  Dayne Freitag,et al.  Toward General-Purpose Learning for Information Extraction , 1998, ACL.

[12]  Oren Etzioni,et al.  Modeling Missing Data in Distant Supervision for Information Extraction , 2013, TACL.

[13]  Mark Stevenson,et al.  A Semantic Approach to IE Pattern Induction , 2005, ACL.

[14]  Jeffrey Heer,et al.  Research and applications: Induced lexico-syntactic patterns improve information extraction from online medical forums , 2014, J. Am. Medical Informatics Assoc..

[15]  James R. Curran,et al.  Reducing Semantic Drift with Bagging and Distributional Similarity , 2009, ACL.

[16]  Pedro M. Domingos,et al.  Unsupervised Ontology Induction from Text , 2010, ACL.

[17]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[18]  Le Zhao,et al.  Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction , 2013, ACL.

[19]  Frederick Reiss,et al.  Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! , 2013, EMNLP.

[20]  J. Curran,et al.  Minimising semantic drift with Mutual Exclusion Bootstrapping , 2007 .

[21]  Ralph Grishman,et al.  Unsupervised Learning of Generalized Names , 2002, COLING.

[22]  Siddharth Patwardhan,et al.  Widening the Field of View of Information Extraction Through Sentential Event Recognition , 2010 .

[23]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[24]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[25]  Fabio Ciravegna,et al.  Adaptive Information Extraction from Text by Rule Induction and Generalisation , 2001, IJCAI.

[26]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[27]  Ralph Grishman,et al.  Automatic Acquisition of Domain Knowledge for Information Extraction , 2000, COLING.

[28]  Christopher D. Manning,et al.  Legal Docket-Entry Classification : Where Machine Learning stumbles , 2008 .

[29]  Ellen Riloff,et al.  A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[30]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[31]  Ralph Grishman,et al.  Bootstrapped Learning of Semantic Classes from Positive and Negative Examples , 2003 .

[32]  Patrick Pantel,et al.  Automatically Labeling Semantic Classes , 2004, NAACL.

[33]  Christopher D. Manning,et al.  SPIED: Stanford Pattern based Information Extraction and Diagnostics , 2014 .