Boosting Relation Extraction with Limited Closed-World Knowledge

This paper presents a new approach to improving relation extraction based on minimally supervised learning. By adding some limited closed-world knowledge for confidence estimation of learned rules to the usual seed data, the precision of relation extraction can be considerably improved. Starting from an existing baseline system we demonstrate that utilizing limited closed world knowledge can effectively eliminate "dangerous" or plainly wrong rules during the bootstrapping process. The new method improves the reliability of the confidence estimation and the precision value of the extracted instances. Although recall suffers to a certain degree depending on the domain and the selected settings, the overall performance measured by F-score considerably improves. Finally we validate the adaptability of the best ranking method to a new domain and obtain promising results.

[1]  Ralph Grishman,et al.  Scenario customization for information extraction , 2000 .

[2]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[3]  Ulf Leser,et al.  Simple tricks for improving pattern-based information extraction from the biomedical literature , 2010, J. Biomed. Semant..

[4]  Mark Stevenson,et al.  Improving Semi-supervised Acquisition of Relation Extraction Patterns , 2006 .

[5]  Mark Stevenson,et al.  A Semantic Approach to IE Pattern Induction , 2005, ACL.

[6]  Eugene Agichtein Confidence Estimation Methods for Partially Supervised Information Extraction , 2006, SDM.

[7]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[8]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[9]  Razvan C. Bunescu,et al.  Learning to Extract Relations from the Web using Minimal Supervision , 2007, ACL.

[10]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[11]  Praveen Paritosh,et al.  The anatomy of a large-scale human computation engine , 2010, HCOMP '10.

[12]  Peter Siniakov GROPUS - an adaptive rule based algorithm for information extraction , 2008 .

[13]  Rosie Jones,et al.  Learning to Extract Entities from Labeled and Unlabeled Text , 2005 .

[14]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[15]  Hans Uszkoreit,et al.  Analysis and Improvement of Minimally Supervised Machine Learning for Relation Extraction , 2009, NLDB.

[16]  Ulf Leser,et al.  ALIBABA: PubMed as a graph , 2006, Bioinform..

[17]  Fei-Yu Xu,et al.  Bootstrapping relation extraction from semantic seeds , 2008 .

[18]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[19]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[20]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[21]  Ralf Klabunde,et al.  Computerlinguistik und Sprachtechnologie : eine Einführung , 2010 .

[22]  Francisco Iacobelli,et al.  Finding New Information Via Robust Entity Detection , 2010, AAAI Fall Symposium: Proactive Assistant Agents.

[23]  Ralph Grishman,et al.  An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition , 2003, ACL.

[24]  Roman Yangarber,et al.  Counter-Training in Discovery of Semantic Patterns , 2003, ACL.

[25]  Ulrich Schäfer,et al.  Shallow Processing with Unification and Typed Feature Structures - Foundations and Applications , 2004, Künstliche Intell..

[26]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[27]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[28]  Hans Uszkoreit,et al.  A Seed-driven Bottom-up Machine Learning Framework for Extracting Relations of Various Complexity , 2007, ACL.

[29]  Scott Pezanowski,et al.  HEALTH GeoJunction: place-time-concept browsing of health publications , 2010, International journal of health geographics.

[30]  Ralph Grishman,et al.  Automatic Acquisition of Domain Knowledge for Information Extraction , 2000, COLING.