Estimating latent trends in multivariate longitudinal data via Parafac2 with functional and structural constraints

Longitudinal data are inherently multimode in the sense that such data are often collected across multiple modes of variation, for example, time × variables × subjects. In many longitudinal studies, multiple variables are collected to measure some latent construct(s) of interest. In such cases, the goal is to understand temporal trends in the latent variables, as well as individual differences in the trends. Multimode component analysis models provide a powerful framework for discovering latent trends in longitudinal data. However, classic implementations of multimode models do not take into consideration functional information (i.e., the temporal sequence of the collected data) or structural information (i.e., which variables load onto which latent factors) about the study design. In this paper, we reveal how functional and structural constraints can be imposed in multimode models (Parafac and Parafac2) in order to elucidate trends in longitudinal data. As a motivating example, we consider a longitudinal study on per capita alcohol consumption trends conducted from 1970 to 2013 by the U.S. National Institute on Alcohol Abuse and Alcoholism. We demonstrate how functional and structural information about the study design can be incorporated into the Parafac and Parafac2 alternating least squares algorithms to understand temporal and regional trends in three latent constructs: beer consumption, spirits consumption, and wine consumption. Our results reveal that Americans consume more than the recommended amount of alcohol, and total alcohol consumption trends show no signs of decreasing in the last decade.

[1]  A. Stegeman,et al.  On Kruskal's uniqueness condition for the Candecomp/Parafac decomposition , 2007 .

[2]  R. Harshman The differences between analysis of covariance and correlation , 2001 .

[3]  Vin de Silva,et al.  Tensor rank and the ill-posedness of the best low-rank approximation problem , 2006, math/0607647.

[4]  Rasmus Bro,et al.  A comparison of algorithms for fitting the PARAFAC model , 2006, Comput. Stat. Data Anal..

[5]  David E. Booth,et al.  Multi-Way Analysis: Applications in the Chemical Sciences , 2005, Technometrics.

[6]  R. Harshman,et al.  Uniqueness proof for a family of models sharing features of Tucker's three-mode factor analysis and PARAFAC/candecomp , 1996 .

[7]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[8]  A. Stegeman,et al.  On the Non-Existence of Optimal Solutions and the Occurrence of “Degeneracy” in the CANDECOMP/PARAFAC Model , 2008, Psychometrika.

[9]  Three-mode factor analysis with binary core and orthonormality constraints , 1992 .

[10]  R. Bro,et al.  PARAFAC2—Part I. A direct fitting algorithm for the PARAFAC2 model , 1999 .

[11]  Paolo Giordani,et al.  Constrained Candecomp/Parafac via the Lasso , 2013, Psychometrika.

[12]  Marieke E. Timmerman,et al.  Three-way component analysis with smoothness constraints , 2002 .

[13]  Rasmus Bro,et al.  Multi-way Analysis with Applications in the Chemical Sciences , 2004 .

[14]  John Geweke,et al.  Maximum Likelihood "Confirmatory" Factor Analysis of Economic Time Series , 1981 .

[15]  I. Mechelen,et al.  SCA with rotation to distinguish common and distinctive information in linked data , 2013, Behavior Research Methods.

[16]  A. Stegeman Degeneracy in Candecomp/Parafac explained for p × p × 2 arrays of rank p + 1 or higher , 2006 .

[17]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[18]  Frans J. Oort,et al.  Stochastic three‐mode models for mean and covariance structures , 1999 .

[19]  Tom F. Wilderjans,et al.  Performing DISCO-SCA to search for distinctive and common information in linked data , 2013, Behavior Research Methods.

[20]  J. Rhodes A concise proof of Kruskal’s theorem on tensor decomposition , 2009, 0901.1796.

[21]  P. Rousseeuw,et al.  The Shape of Correlation Matrices , 1994 .

[22]  R. Harshman,et al.  PARAFAC: parallel factor analysis , 1994 .

[23]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[24]  Rasmus Bro,et al.  Recent developments in CANDECOMP/PARAFAC algorithms: a critical review , 2003 .

[25]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[26]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[27]  R. Bro,et al.  A fast non‐negativity‐constrained least squares algorithm , 1997 .

[28]  K. Jöreskog A general approach to confirmatory maximum likelihood factor analysis , 1969 .

[29]  Alwin Stegeman,et al.  Low-Rank Approximation of Generic p˟q˟2 Arrays and Diverging Components in the Candecomp/Parafac Model , 2008, SIAM J. Matrix Anal. Appl..

[30]  A. Stegeman Degeneracy in Candecomp/Parafac and Indscal Explained For Several Three-Sliced Arrays With A Two-Valued Typical Rank , 2007, Psychometrika.

[31]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[32]  Marieke E. Timmerman,et al.  Four simultaneous component models for the analysis of multivariate time series from more than one subject to model intraindividual and interindividual differences , 2003 .

[33]  Nathaniel E. Helwig,et al.  The Special Sign Indeterminacy of the Direct-Fitting Parafac2 Model: Some Implications, Cautions, and Recommendations for Simultaneous Component Analysis , 2013, Psychometrika.

[34]  R. Cattell “Parallel proportional profiles” and other principles for determining the choice of factors by rotation , 1944 .

[35]  Paolo Giordani,et al.  A weak degeneracy revealing decomposition for the CANDECOMP/PARAFAC model , 2010 .

[36]  J. Berge,et al.  Some uniqueness results for PARAFAC2 , 1996 .

[37]  Peter C. M. Molenaar,et al.  A dynamic factor model for the analysis of multivariate time series , 1985 .

[38]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[39]  R. Cattell The three basic factor-analytic research designs-their interrelations and derivatives. , 1952, Psychological bulletin.

[40]  F. Oort,et al.  Three-mode models for multivariate longitudinal data. , 2001, The British journal of mathematical and statistical psychology.