Ontology Maintenance using Textual Analysis

Ontologies are continuously confronted to evolution problem. Due to the complexity of the changes to be made, a maintenance process, at least a semi-automatic one, is more and more necessary to facilitate this task and to ensure its reliability. In this paper, we propose a maintenance ontology model for a domain, whose originality is to be language independent and based on a sequence of text processing in order to extract highly related terms from corpus. Initially, we deploy the document classification technique using GRAMEXCO to generate classes of texts segments having a similar information type and identify their shared lexicon, agreed as highly related to a unique topic. This technique allows a first general and robust exploration of the corpus. Further, we apply the Latent Semantic Indexing method to extract from this shared lexicon, the most associated terms that has to be seriously considered by an expert to eventually confirm their relevance and thus updating the current ontology. Finally, we show how the complementarity between these two techniques, based on cognitive foundation, constitutes a powerful refinement process.

[1]  Chung Hee Hwang,et al.  Incompletely and Imprecisely Speaking: Using Dynamic Ontologies for Representing and Retrieving Information , 1999, KRDB.

[2]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[3]  Peter Weinstein,et al.  Ontology-based metadata: transforming the MARC legacy , 1998, DL '98.

[4]  Grigori Sidorov,et al.  Text Categorization Using a Hierarchical Topic Dictionary , 1999 .

[5]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[6]  A. Abeillé,et al.  Sémantique pour l'analyse : de la linguistique à l'informatique , 1996 .

[7]  Steffen Staab,et al.  Discovering Conceptual Relations from Text , 2000, ECAI.

[8]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[9]  Ismaïl Biskri,et al.  SATIM : Système d'Analyse et de Traitement de l'Information Multidimensionnelle , 2002 .

[10]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[11]  Gio Wiederhold,et al.  Ontology Maintenance with an Algebraic Methodology: a Case Study * , 2003 .

[12]  Charles K. Nicholas,et al.  Spotting Topics with the Singular Value Decomposition , 1998, PODDP.

[13]  Kenneth Ward Church,et al.  Parsing, Word Associations and Typical Predicate-Argument Relations , 1989, HLT.

[14]  Gerard Salton,et al.  On the use of spreading activation methods in automatic information , 1988, SIGIR '88.

[15]  Ismaïl Biskri,et al.  Un modèle hybride pour le textual data mining : un mariage de raison entre le numérique et le linguistique , 1999 .

[16]  T. Pavel,et al.  "Possible Worlds" in Literary Semantics , 1975 .

[17]  F Wiesman,et al.  Information retrieval: an overview of system characteristics. , 1997, International journal of medical informatics.

[18]  T DumaisSusan,et al.  Using linear algebra for intelligent information retrieval , 1995 .

[19]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[20]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[21]  S. Grossberg Neural Networks and Natural Intelligence , 1988 .

[22]  M Damashek,et al.  Gauging Similarity with n-Grams: Language-Independent Categorization of Text , 1995, Science.

[23]  MSc Susan Jones BA,et al.  Text and Context , 1991, Springer London.

[24]  Jean-Guy Meunier,et al.  Étude experimentale comparative des methodes statistiques pour la classification des donnees textuelles , 1998 .

[25]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.