Simplicial principal component analysis for density functions in Bayes spaces

Probability density functions are frequently used to characterize the distributional properties of large-scale database systems. As functional compositions, densities primarily carry relative information. As such, standard methods of functional data analysis (FDA) are not appropriate for their statistical processing. The specific features of density functions are accounted for in Bayes spaces, which result from the generalization to the infinite dimensional setting of the Aitchison geometry for compositional data. The aim is to build up a concise methodology for functional principal component analysis of densities. A simplicial functional principal component analysis (SFPCA) is proposed, based on the geometry of the Bayes space B 2 of functional compositions. SFPCA is performed by exploiting the centred log-ratio transform, an isometric isomorphism between B 2 and L 2 which enables one to resort to standard FDA tools. The advantages of the proposed approach with respect to existing techniques are demonstrated using simulated data and a real-world example of population pyramids in Upper Austria.

[1]  Alessandra Menafoglio,et al.  A Universal Kriging predictor for spatially dependent functional data of a Hilbert Space , 2013 .

[2]  Badih Ghattas,et al.  Classifying densities using functional regression trees: Applications in oceanology , 2007, Comput. Stat. Data Anal..

[3]  John A. Rice,et al.  Displaying the important features of large collections of similar curves , 1992 .

[4]  Alberto Guadagnini,et al.  A kriging approach based on Aitchison geometry for the characterization of particle-size curves in heterogeneous aquifers , 2014, Stochastic Environmental Research and Risk Assessment.

[5]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[6]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[7]  V. Pawlowsky-Glahn,et al.  Simplicial geometry for compositional data , 2006, Geological Society, London, Special Publications.

[8]  Robin Christopher A New Panel Data Treatment for Heterogeneity in Time Trends , 2006 .

[9]  Distance-based LISA maps for multivariate lattice data , 2008 .

[10]  V. Pawlowsky-Glahn,et al.  Compositional data analysis : theory and applications , 2011 .

[11]  H. Shang A survey of functional principal component analysis , 2014 .

[12]  Piotr Kokoszka,et al.  Inference for Functional Data with Applications , 2012 .

[13]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[14]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[15]  Karel Hron,et al.  Preprocessing of centred logratio transformed density functions using smoothing splines , 2015, 1501.07047.

[16]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[17]  V. Pawlowsky-Glahn,et al.  Hilbert Space of Probability Density Functions Based on Aitchison Geometry , 2006 .

[18]  V. Pawlowsky-Glahn,et al.  Bayes Hilbert Spaces , 2014 .

[19]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[20]  Alberto Guadagnini,et al.  A Class-Kriging Predictor for Functional Compositions with Application to Particle-Size Curves in Heterogeneous Aquifers , 2016, Mathematical Geosciences.

[21]  M. I. Ortego,et al.  Bayes spaces: use of improper distributions and exponential families , 2013 .

[22]  Dominik Liebl,et al.  Modeling and forecasting electricity spot prices: A functional data perspective , 2013, 1310.1628.

[23]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[24]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[25]  Peter Filzmoser,et al.  Imputation of missing values for compositional data using classical and robust methods , 2008 .

[26]  P. Filzmoser,et al.  Principal component analysis for compositional data with outliers , 2009 .

[27]  K. J. Utikal,et al.  Inference for Density Families Using Functional Principal Component Analysis , 2001 .

[28]  Pedro Delicado,et al.  Functional k-sample problem when data are density functions , 2007, Comput. Stat..

[29]  Pedro Delicado,et al.  Dimensionality reduction when data are density functions , 2011, Comput. Stat. Data Anal..

[30]  Zhen Zhang,et al.  Functional density synchronization , 2011, Comput. Stat. Data Anal..

[31]  J. Egozcue Reply to “On the Harker Variation Diagrams; …” by J.A. Cortés , 2009 .

[32]  V. Pawlowsky-Glahn,et al.  Geometric approach to statistical analysis on the simplex , 2001 .

[33]  James O. Ramsay,et al.  Functional Data Analysis , 2005 .

[34]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[35]  Peter Filzmoser,et al.  Robustness for Compositional Data , 2013 .

[36]  Juan José Egozcue Rubí,et al.  Bayes linear spaces , 2010 .

[37]  James O. Ramsay,et al.  Applied Functional Data Analysis: Methods and Case Studies , 2002 .

[38]  G. Mateu-Figueras,et al.  Compositional Data Analysis in the Geosciences: From Theory to Practice , 2006 .

[39]  R. Olea,et al.  Dealing with Zeros , 2011 .