An ontology-based cluster analysis framework

The main objectives of this paper is to propose a conceptual and software environment in which different aspects of cluster analysis of ontology-based data could be studied. The ontology-based dataset has two core components: description of categories and description of objects and relationships between them. Similarity between objects is defined as an amalgamation function of taxonomic, relationship and attribute similarity. The different measures to calculate similarity can be used. Further research is needed in order to evaluate these measures. The creation of a software tool which allows for classification of ontology-based data and comprehensive analysis of results is essential for the research in the area of ontology-based data mining. Such a tool should be universal, extensible and open. The universality manifests itself in the possibility of processing any data sets described by OWL tailored to meet individual requirements. The system extensibility means that it can be enriched with new elements without the necessity of making changes in its main elements. The openness enables the communications with other data analysis systems. In the paper theoretical aspects of cluster analysis of ontology-based data sets are presented. Next, a framework of cluster analysis system is outlined. Finally, some technical details of the system implementation are discussed.