Partitional Conceptual Clustering of Web Resources Annotated with Ontology Languages

The paper deals with the problem of cluster discovery in the context of Semantic Web knowledge bases. A partitional clustering algorithm is presented. It is applied for grouping resources contained in knowledge bases and expressed in the standard ontology languages. The method exploits a language-independent semi-distance measure for individuals that is based on the semantics of the resources w.r.t. a context represented by a set of concept descriptions (discriminating features). The clustering algorithm adapts Bisecting k-Means method to work with medoids. Besides, we propose simple mechanisms to assign each cluster an intensional definition that may suggest new concepts for the knowledge base (vivification). A final experiment demonstrates the validity of the approach through absolute quality indices for clustering results.

[1]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[2]  Steffen Staab,et al.  Efficient Discovery of Services Specified in Description Logics Languages , 2007, SMRR.

[3]  Luigi Iannone,et al.  Concept Formation in Expressive Description Logics , 2004, ECML.

[4]  Katharina Morik,et al.  A Polynomial Approach to the Constructive Induction of Structural Knowledge , 2004, Machine Learning.

[5]  Nicola Fanizzi,et al.  Randomized metric induction and evolutionary conceptual clustering for semantic knowledge bases , 2007, CIKM '07.

[6]  Alexander Borgida,et al.  Towards Measuring Similarity in Description Logics , 2005, Description Logics.

[7]  Luigi Iannone,et al.  An algorithm based on counterfactuals for concept learning in the Semantic Web , 2005, Applied Intelligence.

[8]  Nicola Fanizzi,et al.  Conceptual Clustering and Its Application to Concept Drift and Novelty Detection , 2008, ESWC.

[9]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[10]  Diego Calvanese,et al.  The Description Logic Handbook , 2007 .

[11]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[12]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[13]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[14]  Olfa Nasraoui,et al.  One Step Evolutionary Mining of Context Sensitive Associations and Web Navigation Patterns , 2002, SDM.

[15]  Jan Paredaens,et al.  Advances in Database Systems , 1994 .

[16]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[17]  Jens Lehmann,et al.  A Refinement Operator Based Learning Algorithm for the ALC Description Logic , 2007, ILP.

[18]  Nicola Fanizzi,et al.  Reasoning by Analogy in Description Logics Through Instance-based Learning , 2006, SWAP.

[19]  Zahir Tari,et al.  On The Move to Meaningful Internet Systems 2003: OTM 2003 Workshops , 2003, Lecture Notes in Computer Science.

[20]  Alexander Borgida,et al.  On the Relative Expressiveness of Description Logics and Predicate Logics , 1996, Artif. Intell..

[21]  Ryszard S. Michalski,et al.  Conceptual Clustering of Structured Objects: A Goal-Oriented Approach , 1986, Artif. Intell..

[22]  Shan-Hwei Nienhuys-Cheng Distances and Limits on Herbrand Interpretations , 1998, ILP.

[23]  Shusaku Tsumoto,et al.  An indiscernibility-based clustering method , 2005, 2005 IEEE International Conference on Granular Computing.

[24]  Pavel Zezula,et al.  Similarity Search: The Metric Space Approach (Advances in Database Systems) , 2005 .

[25]  Jens Lehmann,et al.  Foundations of Refinement Operators for Description Logics , 2007, ILP.

[26]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[27]  Dietrich Wettschereck,et al.  Relational Instance-Based Learning , 1996, ICML.

[28]  Peter A. Flach,et al.  Propositionalization approaches to relational data mining , 2001 .

[29]  G Stix,et al.  The mice that warred. , 2001, Scientific American.

[30]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[31]  Ian Horrocks,et al.  OWL Web Ontology Language Reference-W3C Recommen-dation , 2004 .

[32]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[33]  Krzysztof Janowicz,et al.  Sim-DL: Towards a Semantic Similarity Measurement Theory for the Description Logic ALCNR in Geographic Information Retrieval , 2006, OTM Workshops.

[34]  Michèle Sebag,et al.  Distance Induction in First Order Logic , 1997, ILP.

[35]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data , 2000 .

[36]  K. N. King 2006 IEEE International Conference on Granular Computing , 2006, IEEE Comput. Intell. Mag..

[37]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[38]  Mathias Kirsten,et al.  Relational Distance-Based Clustering , 1998, ILP.