Context-based Hierarchical Clustering for the Ontology Learning

Ontologies provide a common layer which plays a major role in supporting information exchange and sharing. In this paper, we focus on the ontological concept extraction process from HTML documents. In order to improve this process, we propose an unsupervised hierarchical clustering algorithm namely "contextual ontological concept extraction" (COCE) which is an incremental use of the partitioning algorithm Kmeans and is guided by a structural context. Our context exploits the HTML structure and the location of words to select the semantically closer cooccurrents for each word and to improve the words weighting. Guided by this context definition, we perform an incremental clustering that refines the context of each word clusters to obtain semantically extracted concepts. The COCE algorithm offers the choice between either an automatic execution or a user's interaction. We experiment our algorithm on HTML documents related to the tourism domain. Our results show how the execution of our context-based algorithm which implements an incremental process and a successive refinement of clusters improves their conceptual quality and the relevance of the extracted ontological concepts

[1]  Patrick Brézillon,et al.  Context in problem solving: a survey , 1999, The Knowledge Engineering Review.

[2]  J. McCarthy Some Expert Systems Need Common Sense , 1984, Annals of the New York Academy of Sciences.

[3]  Lobna Karoui Intelligent Ontology Learning based on Context: Answering Crucial Questions , 2006, 2006 International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (CIMCA'06).

[4]  Michael Lougee,et al.  Computer culture. The scientific, intellectual and social impact of the computer. , 1987, Annals of the New York Academy of Sciences.

[5]  Wei-Ying Ma,et al.  Block-based web search , 2004, SIGIR '04.

[6]  Martha Alice Hearst Context and structure in automated full-text information access , 1994 .

[7]  José Palazzo Moreira de Oliveira,et al.  Concept-based knowledge discovery in texts extracted from the Web , 2000, SKDD.

[8]  VelardiPaola,et al.  Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites , 2004 .

[9]  Sadao Kurohashi,et al.  Automatic summarization of Japanese sentences and its application to a WWW KWIC index , 2001, Proceedings 2001 Symposium on Applications and the Internet.

[10]  Brian R. Gaines,et al.  Knowledge acquisition for knowledge-based systems , 1991, IEEE Expert.

[11]  K. D. Joshi,et al.  A collaborative approach to ontology design , 2002, CACM.

[12]  Paola Velardi,et al.  Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites , 2004, CL.

[13]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[14]  J. Barwise,et al.  THE RIGHTS AND WRONGS OF NATURAL REGULARITY , 1994 .

[15]  Brigitte Grau,et al.  SVETLAN' a system to classify nouns in context , 2000 .

[16]  B. Michelet L' analyse des associations , 1988 .

[17]  Hasan Davulcu,et al.  OntoMiner: bootstrapping ontologies from overlapping domain specific web sites , 2004, WWW Alt. '04.

[18]  Paola Velardi,et al.  Quantitative and Qualitative Evaluation of the OntoLearn Ontology Learning System , 2004, COLING.

[19]  Andreas Paepcke,et al.  Accordion summarization for end-game browsing on PDAs and cellular phones , 2001, CHI.