Log-PCA versus Geodesic PCA of histograms in the Wasserstein space

This paper is concerned by the statistical analysis of data sets whose elements are random histograms. For the purpose of learning principal modes of variation from such data, we consider the issue of computing the PCA of histograms with respect to the 2-Wasserstein distance between probability measures. To this end, we propose to compare the methods of log-PCA and geodesic PCA in the Wasserstein space as introduced by Bigot et al. (2015) and Seguy and Cuturi (2015). Geodesic PCA involves solving a non-convex optimization problem. To solve it approximately, we propose a novel forward-backward algorithm. This allows a detailed comparison between log-PCA and geodesic PCA of one-dimensional histograms, which we carry out using various data sets, and stress the benefits and drawbacks of each method. We extend these results for two-dimensional data and compare both methods in that setting.

[1]  Y. Brenier Polar Factorization and Monotone Rearrangement of Vector-Valued Functions , 1991 .

[2]  L. Ambrosio,et al.  Gradient flows with metric and differentiable structures, and applications to the Wasserstein space , 2004 .

[3]  Antonio Irpino,et al.  Dimension Reduction Techniques for Distributional Symbolic Data , 2013, IEEE Transactions on Cybernetics.

[4]  Marco Cuturi,et al.  Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric , 2015, NIPS.

[5]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[6]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[7]  C. Villani Topics in Optimal Transportation , 2003 .

[8]  Jérémie Bigot,et al.  Geodesic PCA in the Wasserstein space by Convex PCA , 2017 .

[9]  Antonin Chambolle,et al.  On the ergodic convergence rates of a first-order primal–dual algorithm , 2016, Math. Program..

[10]  Gustavo K. Rohde,et al.  A Linear Optimal Transportation Framework for Quantifying and Visualizing Variations in Sets of Images , 2012, International Journal of Computer Vision.

[11]  P. Thomas Fletcher,et al.  Principal geodesic analysis for the study of nonlinear statistics of shape , 2004, IEEE Transactions on Medical Imaging.

[12]  H. Whitney Geometric Integration Theory , 1957 .

[13]  G. Burton TOPICS IN OPTIMAL TRANSPORTATION (Graduate Studies in Mathematics 58) By CÉDRIC VILLANI: 370 pp., US$59.00, ISBN 0-8218-3312-X (American Mathematical Society, Providence, RI, 2003) , 2004 .

[14]  L. Evans Measure theory and fine properties of functions , 1992 .

[15]  Søren Hauberg,et al.  Manifold Valued Statistics, Exact Principal Geodesic Analysis and the Effect of Linear Approximations , 2010, ECCV.

[16]  Thomas Brox,et al.  iPiano: Inertial Proximal Algorithm for Nonconvex Optimization , 2014, SIAM J. Imaging Sci..

[17]  Dirk A. Lorenz,et al.  An Inertial Forward-Backward Algorithm for Monotone Inclusions , 2014, Journal of Mathematical Imaging and Vision.

[18]  H. Muller,et al.  Functional data analysis for density functions by transformation to a Hilbert space , 2016, 1601.02869.

[19]  Guillaume Carlier,et al.  Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[20]  L. Ambrosio,et al.  Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .