A robust Parafac model for compositional data

ABSTRACT Compositional data are characterized by values containing relative information, and thus the ratios between the data values are of interest for the analysis. Due to specific features of compositional data, standard statistical methods should be applied to compositions expressed in a proper coordinate system with respect to an orthonormal basis. It is discussed how three-way compositional data can be analyzed with the Parafac model. When data are contaminated by outliers, robust estimates for the Parafac model parameters should be employed. It is demonstrated how robust estimation can be done in the context of compositional data and how the results can be interpreted. A real data example from macroeconomics underlines the usefulness of this approach.

[1]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[2]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[3]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[4]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[5]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[6]  J. Kruskal Rank, decomposition, and uniqueness for 3-way and n -way arrays , 1989 .

[7]  Rasmus Bro,et al.  MULTI-WAY ANALYSIS IN THE FOOD INDUSTRY Models, Algorithms & Applications , 1998 .

[8]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[9]  J. Aitchison,et al.  Logratio Analysis and Compositional Distance , 2000 .

[10]  H. Kiers,et al.  Three-mode principal components analysis: choosing the numbers of components and sensitivity to local optima. , 2000, The British journal of mathematical and statistical psychology.

[11]  H. Kiers Some procedures for displaying results from three‐way methods , 2000 .

[12]  V. Pawlowsky-Glahn,et al.  Geometric approach to statistical analysis on the simplex , 2001 .

[13]  P. Guttorp,et al.  Statistical Interpretation of Species Composition , 2001 .

[14]  J. Aitchison,et al.  Biplots of Compositional Data , 2002 .

[15]  Rasmus Bro,et al.  Jack-knife technique for outlier detection and estimation of standard errors in PARAFAC models , 2003 .

[16]  V. Pawlowsky-Glahn,et al.  Dealing with Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation , 2003 .

[17]  R. Bro,et al.  Practical aspects of PARAFAC modeling of fluorescence excitation‐emission data , 2003 .

[18]  R. Bro,et al.  A new efficient method for determining the number of components in PARAFAC models , 2003 .

[19]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[20]  Rasmus Bro,et al.  Multi-way Analysis with Applications in the Chemical Sciences , 2004 .

[21]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[22]  V. Pawlowsky-Glahn,et al.  Groups of Parts and Their Balances in Compositional Data Analysis , 2005 .

[23]  David E. Booth,et al.  Multi-Way Analysis: Applications in the Chemical Sciences , 2005, Technometrics.

[24]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[25]  A. Stegeman Degeneracy in Candecomp/Parafac explained for p × p × 2 arrays of rank p + 1 or higher , 2006 .

[26]  H. Kiers,et al.  Selecting among three-mode principal component models of different types and complexities: a numerical convex hull based method. , 2006, The British journal of mathematical and statistical psychology.

[27]  Rasmus Bro,et al.  A comparison of algorithms for fitting the PARAFAC model , 2006, Comput. Stat. Data Anal..

[28]  Mia Hubert,et al.  Automatically identifying scatter in fluorescence data using robust techniques , 2007 .

[29]  J. A. Martín-Fernández,et al.  A modified EM alr-algorithm for replacing rounded zeros in compositional data sets , 2008, Comput. Geosci..

[30]  P. Kroonenberg Applied Multiway Data Analysis , 2008 .

[31]  P. Filzmoser,et al.  Outlier Detection for Compositional Data Using Robust Methods , 2008 .

[32]  P. Filzmoser,et al.  Principal component analysis for compositional data with outliers , 2009 .

[33]  P. Filzmoser,et al.  Univariate Statistical Analysis of Environmental (compositional) Data: Problems and Possibilities , 2009 .

[34]  A. Stegeman On uniqueness conditions for Candecomp/Parafac and Indscal with full column rank in one mode , 2009 .

[35]  Peter Filzmoser,et al.  Imputation of missing values for compositional data using classical and robust methods , 2008 .

[36]  M. Hubert,et al.  Detecting outlying samples in a parallel factor analysis model. , 2011, Analytica chimica acta.

[37]  R. Olea,et al.  Dealing with Zeros , 2011 .

[38]  V. Pawlowsky-Glahn,et al.  Exploring Compositional Data with the CoDa-Dendrogram , 2011 .

[39]  G. Mateu-Figueras,et al.  The Principle of Working on Coordinates , 2011 .

[40]  G. Mateu-Figueras,et al.  Elements of Simplicial Linear Algebra and Geometry , 2011 .

[41]  Clemens Reimann,et al.  Interpretation of multivariate outliers for compositional data , 2012, Comput. Geosci..

[42]  Mia Hubert,et al.  Robust PARAFAC for incomplete data , 2012 .

[43]  Peter Filzmoser,et al.  Model-based replacement of rounded zeros in compositional data: Classical and robust approaches , 2012, Comput. Stat. Data Anal..

[44]  Josep-Antoni Martín-Fernández,et al.  Dealing with Distances and Transformations for Fuzzy C-Means Clustering of Compositional Data , 2012, J. Classif..

[45]  M. Gallo,et al.  Three-way compositional analysis of water quality monitoring data , 2014, Environmental and Ecological Statistics.

[46]  Michele Gallo,et al.  Log-Ratio and Parallel Factor Analysis: An Approach to Analyze Three-Way Compositional Data , 2013, Advanced Dynamic Modeling of Economic and Social Systems.

[47]  Peter Filzmoser,et al.  Robustness for Compositional Data , 2013 .

[48]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[49]  A. Buccianti,et al.  Weighted principal component analysis for compositional data: application example for the water chemistry of the Arno river (Tuscany, central Italy) , 2013 .

[50]  Lieven De Lathauwer,et al.  On the Uniqueness of the Canonical Polyadic Decomposition of Third-Order Tensors - Part II: Uniqueness of the Overall Decomposition , 2013, SIAM J. Matrix Anal. Appl..

[51]  P. Filzmoser,et al.  Exploring compositional data with the robust compositional biplot , 2014 .

[52]  V. Pawlowsky-Glahn,et al.  Modeling and Analysis of Compositional Data , 2015 .

[53]  M. Gallo Tucker3 Model for Compositional Data , 2015 .

[54]  L. Lathauwer,et al.  Canonical polyadic decomposition of third-order tensors: relaxed uniqueness conditions and algebraic algorithm , 2015, 1501.07251.

[55]  Lieven De Lathauwer,et al.  Generic Uniqueness Conditions for the Canonical Polyadic Decomposition and INDSCAL , 2014, SIAM J. Matrix Anal. Appl..

[56]  Lieven De Lathauwer,et al.  New Uniqueness Conditions for the Canonical Polyadic Decomposition of Third-Order Tensors , 2015, SIAM J. Matrix Anal. Appl..