Enhancing Relation Extraction by Eliciting Selectional Constraint Features from Wikipedia

Selectional Constraints are usually checked for detecting semantic relations. Previous work usually defined the constraints manually based on handcrafted concept taxonomy, which is time-consuming and impractical for large scale relation extraction. Further, the determination of entity type (e.g. NER) based on the taxonomy cannot achieve sufficiently high accuracy. In this paper, we propose a novel approach to extracting relation instances using the features elicited from Wikipedia, a free online encyclopedia. The features are represented as selectional constraints and further employed to enhance the extraction of relations. We conduct case studies on the validation of the extracted instances for two common relations hasArtist(album, artist) and hasDirector(film, director). Substantially high extraction precision (around 0.95) and validation accuracy (near 0.90) are obtained.

[1]  Mark Stevenson,et al.  A Semantic Approach to IE Pattern Induction , 2005, ACL.

[2]  J. Giles Internet encyclopaedias go head to head , 2005, Nature.

[3]  Dan Roth,et al.  Probabilistic Reasoning for Entity & Relation Recognition , 2002, COLING.

[4]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[5]  Satoshi Sekine,et al.  On-Demand Information Extraction , 2006, ACL.

[6]  Paul Buitelaar,et al.  RelExt: A Tool for Relation Extraction from Text in Ontology Extension , 2005, SEMWEB.

[7]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[8]  Ludovic Denoyer,et al.  The XML Wikipedia Corpus , 2006 .

[9]  P. Resnik Selectional constraints: an information-theoretic model and its computational realization , 1996, Cognition.

[10]  Dan I. Moldovan,et al.  Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations , 2003, NAACL.

[11]  Bob J. Wielinga,et al.  Extracting Instances of Relations from Web Documents Using Redundancy , 2006, ESWC.

[12]  Satoshi Sekine,et al.  Extended Named Entity Hierarchy , 2002, LREC.

[13]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[14]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[15]  Claire Cardie,et al.  Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns , 2005, HLT.

[16]  Ludovic Denoyer,et al.  The Wikipedia XML corpus , 2006, SIGF.

[17]  Evgeniy Gabrilovich,et al.  Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge , 2006, AAAI.

[18]  Luis Gravano,et al.  Extracting Relations from Large Plain-Text Collections , 1999 .

[19]  Douglas E. Appelt,et al.  Introduction to Information Extraction , 1999, AI Commun..

[20]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[21]  Jon Oberlander,et al.  IN PROCEEDINGS OF EACL-2006 , 2006 .

[22]  Maria Ruiz-Casado,et al.  Automatic Extraction of Semantic Relationships for WordNet by Means of Pattern Learning from Wikipedia , 2005, NLDB.

[23]  Jakob Voß,et al.  Collaborative thesaurus tagging the Wikipedia way , 2006, ArXiv.

[24]  Jan H. M. Korst,et al.  Automatic Ontology Population by Googling , 2005, BNAIC.

[25]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[26]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[27]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.