Evolutionary Conceptual Clustering of Semantically Annotated Resources

A clustering method is presented which can be applied to knowledge bases storing semantically annotated resources. The method can be used to discover groupings of structured objects expressed in the standard concept languages employed in the Semantic Web. The method exploits effective language-independent semi-distance measures over the space of resources. These are based on their semantics w.r.t. a number of dimensions corresponding to a committee of features represented by a group of discriminating concept descriptions. We show how to obtain a maximally discriminating group of features through a feature construction procedure based on genetic programming. The evolutionary clustering algorithm employed is based on the notion of medoids applied to relational representations. It is able to induce an optimal set of clusters by means of a proper fitness function based on the defined distance and the discernibility criterion. An experimentation with some real ontologies proves the feasibility of our method.

[1]  Shusaku Tsumoto,et al.  An indiscernibility-based clustering method , 2005, 2005 IEEE International Conference on Granular Computing.

[2]  Michèle Sebag,et al.  Distance Induction in First Order Logic , 1997, ILP.

[3]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[4]  Luigi Iannone,et al.  Concept Formation in Expressive Description Logics , 2004, ECML.

[5]  Alexander Borgida,et al.  Towards Measuring Similarity in Description Logics , 2005, Description Logics.

[6]  Diego Calvanese,et al.  The Description Logic Handbook , 2007 .

[7]  Katharina Morik,et al.  A Polynomial Approach to the Constructive Induction of Structural Knowledge , 2004, Machine Learning.

[8]  Nicola Fanizzi,et al.  Reasoning by Analogy in Description Logics Through Instance-based Learning , 2006, SWAP.

[9]  Nicola Fanizzi,et al.  Instance-based retrieval by analogy , 2007, SAC '07.

[10]  Luc De Raedt,et al.  Proceedings of the 12th European Conference on Machine Learning , 2001 .

[11]  Nicola Fanizzi,et al.  Induction of Optimal Semi-distances for Individuals based on Feature Sets , 2007, Description Logics.

[12]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[13]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[14]  Jens Lehmann,et al.  A Refinement Operator Based Learning Algorithm for the ALC Description Logic , 2007, ILP.

[15]  Luigi Iannone,et al.  An Algorithm Based on Counterfactuals for Concept Learning in the Semantic Web , 2005, IEA/AIE.

[16]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[17]  Ryszard S. Michalski,et al.  Conceptual Clustering of Structured Objects: A Goal-Oriented Approach , 1986, Artif. Intell..

[18]  Shan-Hwei Nienhuys-Cheng Distances and Limits on Herbrand Interpretations , 1998, ILP.