PORE: Positive-Only Relation Extraction from Wikipedia Text

Extracting semantic relations is of great importance for the creation of the Semantic Web content. It is of great benefit to semi-automatically extract relations from the free text of Wikipedia using the structured content readily available in it. Pattern matching methods that employ information redundancy cannot work well since there is not much redundancy information in Wikipedia, compared to the Web. Multi-class classification methods are not reasonable since no classification of relation types is available in Wikipedia. In this paper, we propose PORE (Positive-Only Relation Extraction), for relation extraction from Wikipedia text. The core algorithm B-POL extends a state-of-the-art positive-only learning algorithm using bootstrapping, strong negative identification, and transductive inference to work with fewer positive training examples. We conducted experiments on several relations with different amount of training data. The experimental results show that B-POL can work effectively given only a small amount of positive training examples and it significantly out-performs the original positive learning approaches and a multi-class SVM. Furthermore, although PORE is applied in the context of Wikipedia, the core algorithm B-POL is a general approach for Ontology Population and can be adapted to other domains.

[1]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[2]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[3]  Maria Ruiz-Casado,et al.  Automatic Extraction of Semantic Relationships for WordNet by Means of Pattern Learning from Wikipedia , 2005, NLDB.

[4]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[5]  Ting Wang,et al.  Automatic Extraction of Hierarchical Relations from Text , 2006, ESWC.

[6]  Gottfried Vossen,et al.  The World Wide Web and Databases , 2001, Lecture Notes in Computer Science.

[7]  Enrico Motta,et al.  The Semantic Web - ISWC 2005, 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, November 6-10, 2005, Proceedings , 2005, SEMWEB.

[8]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[9]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[10]  Jian Su,et al.  Exploring Various Knowledge in Relation Extraction , 2005, ACL.

[11]  Jiawei Han,et al.  Text classification from positive and unlabeled documents , 2003, CIKM '03.

[12]  Amit P. Sheth,et al.  A Framework for Schema-Driven Relationship Discovery from Unstructured Text , 2006, SEMWEB.

[13]  Zhu Zhang,et al.  Weakly-supervised relation classification for information extraction , 2004, CIKM '04.

[14]  Ludovic Denoyer,et al.  The Wikipedia XML Corpus , 2006, INEX.

[15]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[16]  Gang Wang,et al.  Enhancing Relation Extraction by Eliciting Selectional Constraint Features from Wikipedia , 2007, NLDB.

[17]  Jens Lehmann,et al.  What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content , 2007, ESWC.

[18]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[19]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[20]  Dong-Hong Ji,et al.  Relation Extraction Using Label Propagation Based Semi-Supervised Learning , 2006, ACL.

[21]  Dean Allemang,et al.  The Semantic Web - ISWC 2006, 5th International Semantic Web Conference, ISWC 2006, Athens, GA, USA, November 5-9, 2006, Proceedings , 2006, SEMWEB.

[22]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[23]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[24]  Bob J. Wielinga,et al.  Extracting Instances of Relations from Web Documents Using Redundancy , 2006, ESWC.

[25]  Li Ding,et al.  Characterizing the Semantic Web on the Web , 2006, SEMWEB.

[26]  Luis Gravano,et al.  Extracting Relations from Large Plain-Text Collections , 1999 .

[27]  Gerhard Weikum,et al.  Combining linguistic and statistical analysis to extract relations from web documents , 2006, KDD '06.

[28]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[29]  Markus Krötzsch,et al.  Semantic Wikipedia , 2006, WikiSym '06.

[30]  Mitsuru Ishizuka,et al.  Extracting Relations in Social Networks from the Web Using Similarity Between Collective Contexts , 2006, SEMWEB.

[31]  Juan-Zi Li,et al.  Tree-Structured Conditional Random Fields for Semantic Annotation , 2006, International Semantic Web Conference.

[32]  Steffen Staab,et al.  Towards the self-annotating web , 2004, WWW '04.

[33]  Fabian M. Suchanek,et al.  Yago: A Core of Semantic Knowledge Unifying WordNet and Wikipedia , 2007 .

[34]  Ludovic Denoyer,et al.  The XML Wikipedia Corpus , 2006 .

[35]  Paul Buitelaar,et al.  RelExt: A Tool for Relation Extraction from Text in Ontology Extension , 2005, SEMWEB.

[36]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[37]  Xiaoli Li,et al.  Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.