A Framework for Ontology Learning and Data-driven Change Discovery

In this paper we present Text2Onto, a framework for ontology learning from textual resources. Three main features distinguish Text2Onto from our earlier framework TextToOnto as well as other state-of-the-art ontology learning frameworks. First, by representing the learned knowledge at a meta-level in the form of instantiated modeling primitives within a so called Probabilistic Ontology Model (POM), we remain independent of a concrete target language while being able to translate the instantiated primitives into any (reasonably expressive) knowledge representation formalism. Second, user interaction is a core aspect of Text2Onto and the fact that the system calculates a confidence for each learned object allows to design sophisticated visualizations of the POM. Third, by incorporating strategies for data-driven change discovery, we avoid processing the whole corpus from scratch each time it changes, only selectively updating the POM according to the corpus changes instead. Besides increasing efficiency in this way, it also allows a user to trace the evolution of the ontology with respect to the changes in the underlying corpus.

[1]  David Faure,et al.  A corpus-based conceptual clustering method for verb frames and ontology , 1998 .

[2]  Gilles Bisson,et al.  Designing Clustering Methods for Ontology Building - The Mo'K Workbench , 2000, ECAI Workshop on Ontology Learning.

[3]  Udo Hahn,et al.  Towards Text Knowledge Engineering , 1998, AAAI/IAAI.

[4]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[5]  Paul Buitelaar,et al.  A Protégé Plug-In for Ontology Extraction from Text Based on Linguistic Analysis , 2004, ESWS.

[6]  Paul Buitelaar,et al.  OntoLT: A Protg Plug-In for Ontology Extraction from Text , 2003 .

[7]  Suresh Manandhar,et al.  Extending a Lexical Ontology by a Combination of Distributional Semantics Signatures , 2002, EKAW.

[8]  Steffen Staab,et al.  Ontology Learning , 2004, Encyclopedia of Machine Learning and Data Mining.

[9]  Steffen Staab,et al.  Learning taxonomic relations from heterogeneous sources , 2004 .

[10]  Eugene Charniak,et al.  Finding Parts in Very Large Corpora , 1999, ACL.

[11]  Steffen Staab,et al.  Engineering Ontologies using Semantic Patterns , 2001, OIS@IJCAI.

[12]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[13]  Michael Kifer,et al.  Logical foundations of object-oriented and frame-based languages , 1995, JACM.

[14]  Sophia Ananiadou,et al.  The C-value/NC-value Method of Automatic Recognition for Multi-Word Terms , 1998, ECDL.

[15]  Steffen Staab,et al.  OntoEdit Empowering SWAP: a Case Study in Supporting DIstributed, Loosely-Controlled and evolvInG Engineering of oNTologies (DILIGENT) , 2004, ESWS.

[16]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[17]  Steffen Staab,et al.  DILIGENT: Towards a fine-grained methodology for Distributed, Loosely-controlled and evolving Engineering of oNTologies , 2004, ECAI.

[18]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[19]  Steffen Staab,et al.  Discovering Conceptual Relations from Text , 2000, ECAI.

[20]  Ljiljana Stojanovic,et al.  Methods and tools for ontology evolution , 2004 .