Robust factor analysis for compositional data

Factor analysis as a dimension reduction technique is widely used with compositional data. Using the method for raw data or for improperly transformed data will, however, lead to biased results and consequently to misleading interpretations. Although some procedures, suitable for factor analysis with compositional data, were already developed, they require pre-knowledge of variable groups, or are complicated to handle. We present an approach based on the centred logratio (clr) transformation that does not build on this pre-knowledge, but still recognizes the specific character of compositional data. In addition, by using the isometric logratio transformation it is possible to robustify factor analysis using a robust estimation of the covariance matrix. A back-transformation of the results to the clr space allows an interpretation of the results with compositional biplots. The method is demonstrated with data from the Kola project, a large ecogeochemical mapping project in northern Europe.

[1]  Alexander Basilevsky,et al.  Statistical Factor Analysis and Related Methods , 1994 .

[2]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[3]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[4]  Clemens Reimann,et al.  Environmental Geochemical Atlas of the Central Barents Region , 2010 .

[5]  P. Filzmoser,et al.  Statistical Data Analysis Explained , 2008 .

[6]  P. Filzmoser,et al.  Outlier Detection for Compositional Data Using Robust Methods , 2008 .

[7]  Clemens Reimann,et al.  Statistical data analysis explained : applied environmental statics with R , 2008 .

[8]  P. Rousseeuw,et al.  Robust factor analysis , 2003 .

[9]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[10]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[11]  K. Gabriel,et al.  The biplot graphic display of matrices with application to principal component analysis , 1971 .

[12]  A. Basilevsky Statistical Factor Analysis and Related Methods: Theory and Applications , 1994 .

[13]  V. Pawlowsky-Glahn,et al.  Latent Compositional Factors in The Llobregat River Basin (Spain) Hydrogeochemistry , 2005 .

[14]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[15]  V. Pawlowsky-Glahn,et al.  Groups of Parts and Their Balances in Compositional Data Analysis , 2005 .

[16]  F. Chayes On correlation between variables of constant sum , 1960 .

[17]  J. Aitchison,et al.  Biplots of Compositional Data , 2002 .

[18]  FilzmoserPeter,et al.  Robust factor analysis for compositional data , 2009 .

[19]  Clemens Reimann,et al.  Factor analysis applied to regional geochemical data: problems and possibilities , 2002 .