Unsupervised Ontology Acquisition from Plain Texts: The OntoGain System

We propose OntoGain, a system for unsupervised ontology acquisition from unstructured text which relies on multiword term extraction. For the acquisition of taxonomic relations, we exploit inherent multi-word terms' lexical information in a comparative implementation of agglomerative hierarchical clustering and formal concept analysis methods. For the detection of non-taxonomic relations, we comparatively investigate in OntoGain an association rules based algorithm and a probabilistic algorithm. The OntoGain system allows for transformation of the derived ontology into standard OWL statements. OntoGain results are compared to both hand-crafted ontologies, as well as to a state-of-the art system, in two different domains: the medical and computer science domains.

[1]  Aldo Gangemi,et al.  Unsupervised Learning of Semantic Relations for Molecular Biology Ontologies , 2008, Ontology Learning and Population.

[2]  Steffen Staab,et al.  Discovering Conceptual Relations from Text , 2000, ECAI.

[3]  Hans Friedrich Witschel,et al.  Terminology Extraction and Automatic Indexing Comparison and Qualitative Evaluation of Methods , 2005 .

[4]  Vojtech Svátek,et al.  Discovery of Lexical Entries for Non-taxonomic Relations in Ontology Learning , 2004, SOFSEM.

[5]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[6]  Sophia Ananiadou,et al.  Automatic Discovery of Term Similarities Using Pattern Mining , 2002, COLING-02 on COMPUTERM 2002 second international workshop on computational terminology -.

[7]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[8]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[9]  Paul Buitelaar,et al.  SOBA: SmartWeb Ontology-based Annotation , 2006 .

[10]  Philipp Cimiano,et al.  Ontology Learning from Text: Methods, Evaluation and Applications , 2005 .

[11]  Philipp Cimiano,et al.  Finding the Appropriate Generalization Level for Binary Relations Extracted from the Genia Corpus , 2006 .

[12]  Gerhard Weikum,et al.  SOFIE: a self-organizing framework for information extraction , 2009, WWW '09.

[13]  Goran Nenadic,et al.  Mining term similarities from corpora , 2004 .

[14]  Tobias Scheffer Finding association rules that trade support optimally against confidence , 2005 .

[15]  Peter van Emde Boas,et al.  SOFSEM 2004: Theory and Practice of Computer Science , 2004, Lecture Notes in Computer Science.

[16]  B. Ganter,et al.  Finding all closed sets: A general approach , 1991 .

[17]  Hele-Mai Haav,et al.  An Application of Inductive Concept Analysis to Construction of Domain-specific Ontologies , 2003 .

[18]  Philip Resnik,et al.  Selectional Preference and Sense Disambiguation , 1997 .

[19]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[20]  H. Sofia Pinto,et al.  Ontologies: How can They be Built? , 2004, Knowledge and Information Systems.

[21]  Patrick Pantel,et al.  Automatically Harvesting and Ontologizing Semantic Relations , 2008, Ontology Learning and Population.

[22]  Paola Velardi,et al.  Evaluation of OntoLearn, a Methodology for Automatic Learning of Domain Ontologies , 2005 .

[23]  Johanna Völker,et al.  A Framework for Ontology Learning and Data-driven Change Discovery , 2005 .

[24]  Marko Grobelnik,et al.  A SURVEY OF ONTOLOGY EVALUATION TECHNIQUES , 2005 .

[25]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[26]  Evangelos E. Milios,et al.  AUTOMATIC TERM EXTRACTION AND DOCUMENT SIMILARITY IN SPECIAL TEXT CORPORA , 2003 .

[27]  Philipp Cimiano,et al.  Ontology learning and population from text - algorithms, evaluation and applications , 2006 .

[28]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[29]  Stephen Soderland,et al.  Moving from Textual Relations to Ontologized Relations , 2007, AAAI Spring Symposium: Machine Reading.