Geodesic PCA in the Wasserstein space by Convex PCA

We introduce the method of Geodesic Principal Component Analysis (GPCA) on the space of probability measures on the line, with finite second moment, endowed with the Wasserstein metric. We discuss the advantages of this approach, over a standard functional PCA of probability densities in the Hilbert space of square-integrable functions. We establish the consistency of the method by showing that the empirical GPCA converges to its population counterpart, as the sample size tends to infinity. A key property in the study of GPCA is the isometry between the Wasserstein space and a closed convex subset of the space of square-integrable functions, with respect to an appropriate measure. Therefore, we consider the general problem of PCA in a closed convex subset of a separable Hilbert space, which serves as basis for the analysis of GPCA and also has interest in its own right. We provide illustrative examples on simple statistical models, to show the benefits of this approach for data analysis. The method is also applied to a real dataset of population pyramids.

[1]  G. Baley Price,et al.  On the completeness of a certain metric space with an application to Blaschke’s selection theorem , 1940 .

[2]  Separate continuity and measurability , 1969 .

[3]  H. Ziezold On Expected Figures and a Strong Law of Large Numbers for Random Elements in Quasi-Metric Spaces , 1977 .

[4]  J. Dauxois,et al.  Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference , 1982 .

[5]  H. Attouch Variational convergence for functions and operators , 1984 .

[6]  Gerald Beer,et al.  On convergence of closed sets in a metric space and distance functions , 1985, Bulletin of the Australian Mathematical Society.

[7]  R. B. Vinter VARIATIONAL CONVERGENCE FOR FUNCTIONS AND OPERATORS (Applicable Mathematics Series) , 1986 .

[8]  W. Newey,et al.  Large sample estimation and hypothesis testing , 1986 .

[9]  Y. Brenier Polar Factorization and Monotone Rearrangement of Vector-Valued Functions , 1991 .

[10]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[11]  M. Talagrand,et al.  Probability in Banach spaces , 1991 .

[12]  G. D. Maso,et al.  An Introduction to-convergence , 1993 .

[13]  R. Wets,et al.  Consistency of Minimizers and the SLLN for Stochastic Programs 1 , 1995 .

[14]  B. Silverman,et al.  Smoothed functional principal components analysis by choice of norm , 1996 .

[15]  Bernard W. Silverman,et al.  Functional Data Analysis , 1997 .

[16]  J. O. Ramsay,et al.  Functional Data Analysis (Springer Series in Statistics) , 1997 .

[17]  K. J. Utikal,et al.  Inference for Density Families Using Functional Principal Component Analysis , 2001 .

[18]  C. Villani Topics in Optimal Transportation , 2003 .

[19]  R. Bhattacharya,et al.  LARGE SAMPLE THEORY OF INTRINSIC AND EXTRINSIC SAMPLE MEANS ON MANIFOLDS—II , 2003 .

[20]  P. Thomas Fletcher,et al.  Principal geodesic analysis for the study of nonlinear statistics of shape , 2004, IEEE Transactions on Medical Imaging.

[21]  L. Ambrosio,et al.  Gradient flows with metric and differentiable structures, and applications to the Wasserstein space , 2004 .

[22]  D. Billheimer Functional Data Analysis, 2nd edition edited by J. O. Ramsay and B. W. Silverman , 2007 .

[23]  A. Munk,et al.  INTRINSIC SHAPE ANALYSIS: GEODESIC PCA FOR RIEMANNIAN MANIFOLDS MODULO ISOMETRIC LIE GROUP ACTIONS , 2007 .

[24]  Søren Hauberg,et al.  Manifold Valued Statistics, Exact Principal Geodesic Analysis and the Effect of Linear Approximations , 2010, ECCV.

[25]  Otis Chodosh Optimal Transport and Ricci Curvature: Wasserstein Space Over the Interval , 2011, 1105.2883.

[26]  Zhen Zhang,et al.  Functional density synchronization , 2011, Comput. Stat. Data Anal..

[27]  Guillaume Carlier,et al.  Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[28]  Pedro Delicado,et al.  Dimensionality reduction when data are density functions , 2011, Comput. Stat. Data Anal..

[29]  Jérémie Bigot,et al.  Consistent estimation of a population barycenter in the Wasserstein space , 2013 .

[30]  Paul Embrechts,et al.  A note on generalized inverses , 2013, Math. Methods Oper. Res..

[31]  Jean-Michel Loubes,et al.  Statistical properties of the quantile normalization method for density curve alignment. , 2013, Mathematical biosciences.

[32]  Johannes O. Royset,et al.  Random variables, monotone relations, and convex analysis , 2014, Math. Program..