Conceptual Clustering and Its Application to Concept Drift and Novelty Detection

The paper presents a clustering method which can be applied to populated ontologies for discovering interesting groupings of resources therein. The method exploits a simple, yet effective and language-independent, semi-distance measure for individuals, that is based on their underlying semantics along with a number of dimensions corresponding to a set of concept descriptions (discriminating features committee). The clustering algorithm is a partitional method and it is based on the notion of medoids w.r.t. the adopted semi-distance measure. Eventually, it produces a hierarchical organization of groups of individuals. A final experiment demonstrates the validity of the approach using absolute quality indices. We propose two possible exploitations of these clusterings: concept formation and detecting concept drift or novelty.

[1]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[2]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[3]  Nicola Fanizzi,et al.  Reasoning by Analogy in Description Logics Through Instance-based Learning , 2006, SWAP.

[4]  Luigi Iannone,et al.  An algorithm based on counterfactuals for concept learning in the Semantic Web , 2005, Applied Intelligence.

[5]  D. Fogel,et al.  Discovering patterns in spatial data using evolutionary programming , 1996 .

[6]  Steffen Staab,et al.  Efficient Discovery of Services Specified in Description Logics Languages , 2007, SMRR.

[7]  K. N. King 2006 IEEE International Conference on Granular Computing , 2006, IEEE Comput. Intell. Mag..

[8]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[9]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[10]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[11]  Nicola Fanizzi,et al.  Induction of Optimal Semi-distances for Individuals based on Feature Sets , 2007, Description Logics.

[12]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams , 2007, SAC '07.

[13]  Jens Lehmann,et al.  A Refinement Operator Based Learning Algorithm for the ALC Description Logic , 2007, ILP.

[14]  Michèle Sebag,et al.  Distance Induction in First Order Logic , 1997, ILP.

[15]  Shusaku Tsumoto,et al.  An indiscernibility-based clustering method , 2005, 2005 IEEE International Conference on Granular Computing.

[16]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[17]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[18]  Olfa Nasraoui,et al.  One Step Evolutionary Mining of Context Sensitive Associations and Web Navigation Patterns , 2002, SDM.

[19]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[20]  Jan Paredaens,et al.  Advances in Database Systems , 1994 .

[21]  Diego Calvanese,et al.  The Description Logic Handbook , 2007 .

[22]  Pavel Zezula,et al.  Similarity Search: The Metric Space Approach (Advances in Database Systems) , 2005 .

[23]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[24]  Alexander Borgida,et al.  On the Relative Expressiveness of Description Logics and Predicate Logics , 1996, Artif. Intell..

[25]  Ryszard S. Michalski,et al.  Conceptual Clustering of Structured Objects: A Goal-Oriented Approach , 1986, Artif. Intell..

[26]  Shan-Hwei Nienhuys-Cheng Distances and Limits on Herbrand Interpretations , 1998, ILP.

[27]  Nicola Fanizzi,et al.  Randomized metric induction and evolutionary conceptual clustering for semantic knowledge bases , 2007, CIKM '07.

[28]  Katharina Morik,et al.  A Polynomial Approach to the Constructive Induction of Structural Knowledge , 2004, Machine Learning.

[29]  Mathias Kirsten,et al.  Relational Distance-Based Clustering , 1998, ILP.

[30]  Luigi Iannone,et al.  Concept Formation in Expressive Description Logics , 2004, ECML.

[31]  Alexander Borgida,et al.  Towards Measuring Similarity in Description Logics , 2005, Description Logics.

[32]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .