Title-based Approach to Relation Discovery from Wikipedia

With the advent of the Web and the explosion of available textual data, the field of domain ontology engineering has gained more and more importance. The last decade, several successful tools for automatically harvesting knowledge from web data have been developed, but the extraction of taxonomic and non taxonomic ontological relationships is still far from being fully solved. This paper describes a new approach which extracts ontological relations from Wikipedia. The non-taxonomic relations extraction process is performed by analyzing the titles which appear in each document of the studied corpus. This method is based on regular expressions which appear in titles and from which we can extract not only the two arguments of the relationships but also the labels which describe the relations. The resulting set of labels is used in order to retrieve new relations by analyzing the title hierarchy in each document. Other relations can be extracted from titles and subtitles containing only one term. An enrichment step is also applied by considering each term which appears as a relation argument of the extracted links in order to discover new concepts and new relations. The experiments have been performed on French Wikipedia articles related to the medical field. The precision and recall values are encouraging and seem to validate

[1]  Elizabeth Chang,et al.  Semi-Automatic Ontology Extension Using Spreading Activation , 2005 .

[2]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[3]  Martin Kavalec,et al.  A Study on Automated Relation Labelling in Ontology Learning , 2005 .

[4]  Nicola Guarino,et al.  Identity and Subsumption , 2002 .

[5]  Aldo Gangemi,et al.  Unsupervised Learning of Semantic Relations between Concepts of a Molecular Biology Ontology , 2005, IJCAI.

[6]  V. R. Benjamins,et al.  Overview of Knowledge Sharing and Reuse Components: Ontologies and Problem-Solving Methods , 1999, IJCAI 1999.

[7]  F.C. Pembe,et al.  Heading-based sectional hierarchy identification for HTML documents , 2007, 2007 22nd international symposium on computer and information sciences.

[8]  Steffen Staab,et al.  Ontology Learning Part One - On Discoverying Taxonomic Relations from the Web , 2002 .

[9]  Maria Lapata The Semantics of Relationships: An Interdisciplinary Perspective , 2003 .

[10]  Ana María Moreno,et al.  Knowledge maps: An essential technique for conceptualisation , 2000, Data Knowl. Eng..

[11]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[12]  Ezzeddine Zagrouba,et al.  Concepts Extraction based on HTML Documents Structure , 2012, ICAART.

[13]  Timo Honkela,et al.  Learning a taxonomy from a set of text documents , 2012, Appl. Soft Comput..

[14]  Manabu Okumura,et al.  Information Extraction and Semantic Annotation of Wikipedia , 2008, Ontology Learning and Population.

[15]  Iryna Gurevych,et al.  Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary , 2008, LREC.

[16]  Jianhua Chen,et al.  Learning non-taxonomical semantic relations from domain texts , 2011, Journal of Intelligent Information Systems.

[17]  David Faure,et al.  First experiences of using semantic knowledge learned by ASIUM for information extraction task using INTEX , 2000, ECAI Workshop on Ontology Learning.

[18]  D. Powers Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation , 2008 .

[19]  Katia Lida Kermanidis,et al.  One-sided Sampling for Learning Taxonomic Relations in the Modern Greek Economic Domain , 2007 .

[20]  Arno Scharl,et al.  Discovery and evaluation of non-taxonomic relations in domain ontologies , 2009, Int. J. Metadata Semant. Ontologies.

[21]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[22]  Ian H. Witten,et al.  Mining Meaning from Wikipedia , 2008, Int. J. Hum. Comput. Stud..

[23]  Kentaro Torisawa,et al.  Hacking Wikipedia for Hyponymy Relation Acquisition , 2008, IJCNLP.

[24]  Ezzeddine Zagrouba,et al.  Toward a taxonomy of concepts using web documents structure , 2012, IIWAS '12.

[25]  David Sánchez,et al.  Learning non-taxonomic relationships from web documents for domain ontology construction , 2008, Data Knowl. Eng..

[26]  Elizabeth Marshman Expressions of uncertainty in candidate knowledge-rich contexts: a comparison in English and French specialized texts , 2008 .