Conceptual Clustering: Concept Formation, Drift and Novelty Detection

The paper presents a clustering method which can be applied to populated ontologies for discovering interesting groupings of resources therein. The method exploits a simple, yet effective and languageindependent, semi-distance measure for individuals, that is based on their underlying semantics along with a number of dimensions corresponding to a set of concept descriptions (discriminating features committee). The clustering algorithm is a partitional method and it based on the notion of medoids w.r.t. the adopted semi-distance measure. Eventually, it produces a hierarchical organization of groups of individuals. A final experiment demonstrates the validity of the approach using absolute quality indices. We propose two possible exploitations of these clusterings: concept formation and detecting concept drift or novelty.

[1]  Shusaku Tsumoto,et al.  An indiscernibility-based clustering method , 2005, 2005 IEEE International Conference on Granular Computing.

[2]  Nicola Fanizzi,et al.  Reasoning by Analogy in Description Logics Through Instance-based Learning , 2006, SWAP.

[3]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[4]  Alex Borgiday On the Relative Expressiveness of Description Logics and Predicate Logics , 1996 .

[5]  Luigi Iannone,et al.  Concept Formation in Expressive Description Logics , 2004, ECML.

[6]  Katharina Morik,et al.  A Polynomial Approach to the Constructive Induction of Structural Knowledge , 2004, Machine Learning.

[7]  Jens Lehmann,et al.  A Refinement Operator Based Learning Algorithm for the ALC Description Logic , 2007, ILP.

[8]  Editors , 1986, Brain Research Bulletin.

[9]  Alexander Borgida,et al.  Towards Measuring Similarity in Description Logics , 2005, Description Logics.

[10]  Mathias Kirsten,et al.  Relational Distance-Based Clustering , 1998, ILP.

[11]  Diego Calvanese,et al.  The Description Logic Handbook , 2007 .

[12]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[13]  Jens Lehmann Concept Learning in Description Logics , 2006 .

[14]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[15]  Shan-Hwei Nienhuys-Cheng Distances and Limits on Herbrand Interpretations , 1998, ILP.

[16]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[17]  Olfa Nasraoui,et al.  One Step Evolutionary Mining of Context Sensitive Associations and Web Navigation Patterns , 2002, SDM.

[18]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[19]  Michèle Sebag,et al.  Distance Induction in First Order Logic , 1997, ILP.

[20]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams , 2007, SAC '07.

[21]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[22]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[23]  D. Fogel,et al.  Discovering patterns in spatial data using evolutionary programming , 1996 .

[24]  Nicola Fanizzi,et al.  Induction of Optimal Semi-distances for Individuals based on Feature Sets , 2007, Description Logics.

[25]  Pavel Zezula,et al.  Similarity Search: The Metric Space Approach (Advances in Database Systems) , 2005 .

[26]  Luigi Iannone,et al.  An Algorithm Based on Counterfactuals for Concept Learning in the Semantic Web , 2005, IEA/AIE.

[27]  Steffen Staab,et al.  Efficient Discovery of Services Specified in Description Logics Languages , 2007, SMRR.

[28]  Nicola Fanizzi,et al.  Randomized metric induction and evolutionary conceptual clustering for semantic knowledge bases , 2007, CIKM '07.

[29]  Ryszard S. Michalski,et al.  Conceptual Clustering of Structured Objects: A Goal-Oriented Approach , 1986, Artif. Intell..

[30]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.