Fuzzy clustering of distribution-valued data using an adaptive L2 Wasserstein distance

In this paper, a fuzzy c-means algorithm based on an adaptive L2-Wasserstein distance for histogram-valued data is proposed. The adaptive distance induces a set of weights associated with the components of histogram-valued data and thus of the variables. The minimization of the criterion in the fuzzy c-means algorithm is performed according three steps such that the representation, the allocation and the weights associated to the components of the variables are alternately computed until a the convergence of the solution to a local optimum. The effectiveness of the proposed algorithm is demonstrated through experiments with synthetic and real-world datasets.

[1]  Mathieu Vrac,et al.  Copula analysis of mixture models , 2012, Comput. Stat..

[2]  Francisco de A. T. de Carvalho,et al.  Dynamic clustering of histogram data based on adaptive squared Wasserstein distances , 2011, Expert Syst. Appl..

[3]  Francisco de A. T. de Carvalho,et al.  Unsupervised pattern recognition models for mixed feature-type symbolic data , 2010, Pattern Recognit. Lett..

[4]  Yves Lechevallier,et al.  Dynamic Clustering of Interval-Valued Data Based on Adaptive Quadratic Distances , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[5]  Yves Lechevallier,et al.  Partitional clustering algorithms for symbolic interval data based on single adaptive distances , 2009, Pattern Recognit..

[6]  Hichem Frigui,et al.  Clustering and aggregation of relational data with applications to image database categorization , 2007, Pattern Recognit..

[7]  Weina Wang,et al.  On fuzzy cluster validity indices , 2007, Fuzzy Sets Syst..

[8]  Francisco de A. T. de Carvalho,et al.  Fuzzy c-means clustering methods for symbolic interval data , 2007, Pattern Recognit. Lett..

[9]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[10]  W. Gilchrist,et al.  Statistical Modelling with Quantile Functions , 2000 .

[11]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[12]  Rajesh N. Davé,et al.  Validating fuzzy partitions obtained through c-shells clustering , 1996, Pattern Recognit. Lett..

[13]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[14]  Antonio Irpino,et al.  Comparing Histogram Data Using a Mahalanobis–Wasserstein Distance , 2008 .

[15]  Antonio Irpino,et al.  Dynamic Clustering of Histogram Data: Using the Right Metric , 2007 .

[16]  Antonio Irpino,et al.  Optimal histogram representation of large data sets: Fisher vs piecewise linear approximation , 2007, EGC.

[17]  Antonio Irpino,et al.  A New Wasserstein Based Distance for the Hierarchical Clustering of Histogram Symbolic Data , 2006, Data Science and Classification.

[18]  Y. Lechevallier,et al.  Dynamic clustering of histograms using Wasserstein metric , 2006 .

[19]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data , 2000 .

[20]  Carlos Matrán,et al.  Optimal Transportation Plans and Convergence in Distribution , 1997 .

[21]  C. Givens,et al.  A class of Wasserstein metrics for probability distributions. , 1984 .