In this paper we present a review of some metrics to be proposed as allocation functions in the Dynamic Clustering Algorithm (DCA) when data are distribution or histograms of values. The choice of the most suitable distance plays a central role in the DCA because it is related to the criterion function that is optimized. Moreover, it has to be consistent with the prototype which represents the cluster. In such a way, for each proposed metric, we identify the corresponding prototype according to the minimization of the criterion function and then to the best fitting between the partition and the best representation of the clusters. Finally, we focus our attention on a Wassertein based distance showing its optimality in partitioning a set of histogram data with respect to a representation of the clusters by means of their barycenter expressed in terms of distributions.
[1]
P. Diaconis.
Group representations in probability and statistics
,
1988
.
[2]
C. Mallows.
A Note on Asymptotic Joint Normality
,
1972
.
[3]
Yves Lechevallier,et al.
New clustering methods for interval data
,
2006,
Comput. Stat..
[4]
Alison L Gibbs,et al.
On Choosing and Bounding Probability Metrics
,
2002,
math/0209021.
[5]
Peter J. Huber,et al.
Robust Statistics
,
2005,
Wiley Series in Probability and Statistics.
[6]
E. Diday.
Une nouvelle méthode en classification automatique et reconnaissance des formes la méthode des nuées dynamiques
,
1971
.
[7]
King-Sun Fu,et al.
Digital pattern recognition
,
1976,
Communication and cybernetics.