Clustering measure-valued data with Wasserstein barycenters

In this work, learning schemes for measure-valued data are proposed, i.e. data that their structure can be more efficiently represented as probability measures instead of points on $\R^d$, employing the concept of probability barycenters as defined with respect to the Wasserstein metric. Such type of learning approaches are highly appreciated in many fields where the observational/experimental error is significant (e.g. astronomy, biology, remote sensing, etc.) or the data nature is more complex and the traditional learning algorithms are not applicable or effective to treat them (e.g. network data, interval data, high frequency records, matrix data, etc.). Under this perspective, each observation is identified by an appropriate probability measure and the proposed statistical learning schemes rely on discrimination criteria that utilize the geometric structure of the space of probability measures through core techniques from the optimal transport theory. The discussed approaches are implemented in two real world applications: (a) clustering eurozone countries according to their observed government bond yield curves and (b) classifying the areas of a satellite image to certain land uses categories which is a standard task in remote sensing. In both case studies the results are particularly interesting and meaningful while the accuracy obtained is high.

[1]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[2]  Charu C. Aggarwal,et al.  Managing and Mining Uncertain Data , 2009, Advances in Database Systems.

[3]  Julien Jacques,et al.  Model-based clustering for multivariate functional data , 2013, Comput. Stat. Data Anal..

[4]  Hans-Hermann Bock,et al.  Dynamic clustering for interval data based on L2 distance , 2006, Comput. Stat..

[5]  Cordelia Schmid,et al.  High-dimensional data clustering , 2006, Comput. Stat. Data Anal..

[6]  Jeng-Min Chiou,et al.  Functional clustering and identifying substructures of longitudinal data , 2007 .

[7]  Gabriel Peyré,et al.  Convergence of Entropic Schemes for Optimal Transport and Gradient Flows , 2015, SIAM J. Math. Anal..

[8]  A. Yannacopoulos,et al.  A learning algorithm for source aggregation , 2018 .

[9]  B. Mallick,et al.  Functional clustering by Bayesian wavelet methods , 2006 .

[10]  Guillaume Carlier,et al.  Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[11]  Yves Lechevallier,et al.  Partitional clustering algorithms for symbolic interval data based on single adaptive distances , 2009, Pattern Recognit..

[12]  Ashish Ghosh,et al.  Fuzzy clustering algorithms for unsupervised change detection in remote sensing images , 2011, Inf. Sci..

[13]  Yoav Zemel,et al.  Procrustes Metrics on Covariance Operators and Optimal Transportation of Gaussian Processes , 2018, Sankhya A.

[14]  Fatih Murat Porikli,et al.  Human Detection via Classification on Riemannian Manifolds , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[16]  Raymond Y. K. Lau,et al.  Time series k-means: A new k-means type smooth subspace clustering for time series data , 2016, Inf. Sci..

[17]  Victor M. Panaretos,et al.  Fréchet means and Procrustes analysis in Wasserstein space , 2017, Bernoulli.

[18]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[19]  Renato Cordeiro de Amorim,et al.  Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering , 2012, Pattern Recognit..

[20]  Vit Niennattrakul,et al.  On Clustering Multimedia Time Series Data Using K-Means and Dynamic Time Warping , 2007, 2007 International Conference on Multimedia and Ubiquitous Engineering (MUE'07).

[21]  Adam M. Oberman,et al.  NUMERICAL METHODS FOR MATCHING FOR TEAMS AND WASSERSTEIN BARYCENTERS , 2014, 1411.3602.

[22]  Thibaut Le Gouic,et al.  Existence and consistency of Wasserstein barycenters , 2015, Probability Theory and Related Fields.

[23]  F. Santambrogio Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling , 2015 .

[24]  Antonio Irpino,et al.  Dynamic clustering of interval data using a Wasserstein-based distance , 2008, Pattern Recognit. Lett..

[25]  Dimitrios Gunopulos,et al.  A Wavelet-Based Anytime Algorithm for K-Means Clustering of Time Series , 2003 .

[26]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[27]  Feiping Nie,et al.  Learning a Mahalanobis distance metric for data clustering and classification , 2008, Pattern Recognit..

[28]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[29]  R A Haggarty,et al.  Spatially weighted functional clustering of river network data , 2014, Journal of the Royal Statistical Society. Series C, Applied statistics.

[30]  L. K. Hansen,et al.  On Clustering fMRI Time Series , 1999, NeuroImage.

[31]  Brendan Pass,et al.  Wasserstein Barycenters over Riemannian manifolds , 2014, 1412.7726.

[32]  R. McCann A Convexity Principle for Interacting Gases , 1997 .

[33]  Julien Rabin,et al.  Sliced and Radon Wasserstein Barycenters of Measures , 2014, Journal of Mathematical Imaging and Vision.