Cluster Analysis Based on the Central Tendency Deviation Principle

Our main goal is to introduce three clustering functions based on the central tendency deviation principle. According to this approach, we consider to cluster two objects together providing that their similarity is above a threshold. However, how to set this threshold ? This paper gives some insights regarding this issue by extending some clustering functions designed for categorical data to the more general case of real continuous data. In order to approximately solve the corresponding clustering problems, we also propose a clustering algorithm. The latter has a linear complexity in the number of objects and doesn't require a pre-defined number of clusters. Then, our secondary purpose is to introduce a new experimental protocol for comparing different clustering techniques. Our approach uses four evaluation criteria and an aggregation rule for combining the latter. Finally, using fifteen data-sets and this experimental protocol, we show the benefits of the introduced cluster analysis methods.

[1]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[2]  Andreas Rudolph,et al.  Techniques of Cluster Algorithms in Data Mining , 2002, Data Mining and Knowledge Discovery.

[3]  Charles Jordan,et al.  Les Coefficients d'Intensite Relative de Korosy. , 1930 .

[4]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[5]  Classification d'un ensemble de variables qualitatives , 1998 .

[6]  Leon G. Higley,et al.  Forensic Entomology: An Introduction , 2009 .

[7]  William A. Belson,et al.  Matching and Prediction on the Principle of Biological Classification , 1959 .

[8]  F. Marcotorchino Cross Association Measures and Optimal Clustering , 1986 .

[9]  M. Kendall Rank Correlation Methods , 1949 .

[10]  J. M. Bevan,et al.  Rank Correlation Methods , 1949 .

[11]  Jan Vegelius,et al.  The J-Index as a Measure of Nominal Scale Response Agreement , 1982 .

[12]  Mustapha Lebbah,et al.  Relational Analysis for Consensus Clustering from Multiple Partitions , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[13]  Julien Ah-Pine,et al.  Statistical, geometrical and logical independences between categorical variables , 2007 .

[14]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[15]  Yoshiko Wakabayashi The Complexity of Computing Medians of Relations , 1998 .

[16]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[17]  Anil K. Jain,et al.  Landscape of clustering algorithms , 2004, ICPR 2004.

[18]  Leo A. Goodman,et al.  Corrigenda: Measures of Association for Cross Classifications , 1957 .

[19]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[20]  Pierre Michaud,et al.  Modèles d'optimisation en analyse des données relationnelles , 1979 .