Modeling multi‐way data with linearly dependent loadings

A generalization/specialization of the PARAFAC model is developed that improves its properties when applied to multi‐way problems involving linearly dependent factors. This model is called PARALIND (PARAllel profiles with LINear Dependences). Linear dependences can arise when the empirical sources of variation being modeled by factors are causally or logically linked during data generation, or circumstantially linked during data collection. For example, this can occur in a chemical context when end products are related to the precursor or in a psychological context when a single stimulus generates two incompatible feelings at once. For such cases, the most theoretically appropriate PARAFAC model has loading vectors that are linearly dependent in at least one mode, and when collinear, are nonunique in the others. However, standard PARAFAC analysis of fallible data will have neither of these features. Instead, latent linear dependences become high surface correlations and any latent nonuniqueness is replaced by a meaningless surface‐level ‘unique orientation’ that optimally fits the particular random noise in that sample. To avoid these problems, any set of components that in theory should be rank deficient are re‐expressed in PARALIND as a product of two matrices, one that explicitly represents their dependency relationships and another, with fewer columns, that captures their patterns of variation. To demonstrate the approach, we apply it first to fluorescence spectroscopy (excitation‐emission matrices, EEM) data in which concentration values for two analytes covary exactly, and then to flow injection analysis (FIA) data in which subsets of columns are logically constrained to sum to a constant, but differently in each of two modes. In the PARAFAC solutions of the EEM data, all factors are ‘unique’ but this is only meaningful for two of the factors that are also unique at the latent level. In contrast, the PARALIND solutions directly display the extent and nature of partial nonuniqueness present at the latent level by exhibiting a corresponding partial uniqueness in their recovered loadings. For the FIA data, PARALIND constraints restore latent uniqueness to the concentration estimates. Comparison of the solutions shows that PARALIND more accurately recovers latent structure, presumably because it uses fewer parameters and hence fits less error. Copyright © 2009 John Wiley & Sons, Ltd.

[1]  J. Levin Three-mode factor analysis. , 1965, Psychological bulletin.

[2]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[3]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[4]  Richard A. Harshman,et al.  Determination and Proof of Minimum Uniqueness Conditions for PARAFAC1 , 1972 .

[5]  Alan J. Mayne,et al.  Generalized Inverse of Matrices and its Applications , 1972 .

[6]  K. S. Banerjee Generalized Inverse of Matrices and Its Applications , 1973 .

[7]  J. Kruskal More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling , 1976 .

[8]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[9]  S. R. Searle,et al.  The Vec-Permutation Matrix, the Vec Operator and Kronecker Products: A Review , 1981 .

[10]  Avraham Lorber,et al.  Quantifying chemical composition from two-dimensional data arrays , 1984 .

[11]  Bruce R. Kowalski,et al.  Generalized rank annihilation factor analysis , 1986 .

[12]  Bruce R. Kowalski,et al.  Generalized rank annihilation method , 1987 .

[13]  S. Leurgans,et al.  Multilinear Models: Applications in Spectroscopy , 1992 .

[14]  Thomas Kailath,et al.  Azimuth/elevation direction finding using regular array geometries , 1992 .

[15]  Lars Nørgaard,et al.  RANK ANNIHILATION FACTOR ANALYSIS APPLIED TO FLOW INJECTION ANALYSIS WITH PHOTODIODE-ARRAY DETECTION , 1994 .

[16]  Gerrit Kateman,et al.  Generalized rank annihilation method. I: Derivation of eigenvalue problems , 1994 .

[17]  Dominique Bonvin,et al.  On the Rank Deficiency and Rank Augmentation of the Spectral Measurement Matrix , 1996 .

[18]  R. Harshman,et al.  PARAFAC: parallel factor analysis , 1994 .

[19]  Age K. Smilde,et al.  Multicomponent Determination of Chlorinated Hydrocarbons Using a Reaction-Based Chemical Sensor. 3. Medium-Rank Second-Order Calibration with Restricted Tucker Models , 1994 .

[20]  Sue Leurgans,et al.  [27] Component resolution using multilinear models , 1995 .

[21]  R. Manne On the resolution problem in hyphenated chromatography , 1995 .

[22]  Stephen P. Gurden,et al.  Use of eigenvalues for determining the number of components in window factor analysis of spectroscopic and chromatographic data , 1995 .

[23]  R T Ross,et al.  Component resolution using multilinear models. , 1995, Methods in enzymology.

[24]  R. Harshman,et al.  Uniqueness proof for a family of models sharing features of Tucker's three-mode factor analysis and PARAFAC/candecomp , 1996 .

[25]  R. Bro PARAFAC. Tutorial and applications , 1997 .

[26]  Romà Tauler,et al.  Second-order multivariate curve resolution applied to rank-deficient data obtained from acid-base spectrophotometric titrations of mixtures of nucleic bases , 1997 .

[27]  R. Bro,et al.  A fast non‐negativity‐constrained least squares algorithm , 1997 .

[28]  M. Amrhein REACTION AND FLOW VARIANTS/INVARIANTS FOR THE ANALYSIS OF CHEMICAL REACTION DATA , 1998 .

[29]  Rasmus Bro,et al.  Improving the speed of multiway algorithms: Part II: Compression , 1998 .

[30]  A. V. D. Veen Algebraic methods for deterministic blind beamforming , 1998, Proc. IEEE.

[31]  N. Sidiropoulos,et al.  Least squares algorithms under unimodality and non‐negativity constraints , 1998 .

[32]  Age K. Smilde,et al.  Constrained three‐mode factor analysis as a tool for parameter estimation with second‐order instrumental data , 1998 .

[33]  Henk A. L. Kiers,et al.  A three–step algorithm for CANDECOMP/PARAFAC analysis of large data sets with multicollinearity , 1998 .

[34]  Rasmus Bro,et al.  MULTI-WAY ANALYSIS IN THE FOOD INDUSTRY Models, Algorithms & Applications , 1998 .

[35]  P. Paatero The Multilinear Engine—A Table-Driven, Least Squares Program for Solving Multilinear Problems, Including the n-Way Parallel Factor Analysis Model , 1999 .

[36]  Rasmus Bro,et al.  Calibration methods for complex second-order data , 1999 .

[37]  R. Bro Exploratory study of sugar production using fluorescence spectroscopy and multi-way analysis , 1999 .

[38]  Dominique Bonvin,et al.  CALIBRATION OF SPECTRAL REACTION DATA , 1999 .

[39]  Rasmus Bro,et al.  The N-way Toolbox for MATLAB , 2000 .

[40]  K. Booksh,et al.  Mitigation of Rayleigh and Raman spectral interferences in multiway calibration of excitation-emission matrix fluorescence spectra. , 2000, Analytical chemistry.

[41]  N. Sidiropoulos,et al.  On the uniqueness of multilinear decomposition of N‐way arrays , 2000 .

[42]  Nikos D. Sidiropoulos,et al.  Blind PARAFAC receivers for DS-CDMA systems , 2000, IEEE Trans. Signal Process..

[43]  Nikos D. Sidiropoulos,et al.  Identifiability results for blind beamforming in incoherent multipath with small delay spread , 2001, IEEE Trans. Signal Process..

[44]  N.D. Sidiropoulos,et al.  Blind multiuser detection in W-CDMA systems with large delay spread , 2001, IEEE Signal Processing Letters.

[45]  R. Harshman,et al.  ‘Stretch’ vs ‘slice’ methods for representing three‐way structure via matrix notation , 2002 .

[46]  J. Berge,et al.  Partial uniqueness in CANDECOMP/PARAFAC , 2004 .

[47]  F. X. Rius,et al.  Multivariate resolution of rank-deficient near-infrared spectroscopy data from the reaction of curing epoxy resins using the rank augmentation strategy and multivariate curve resolution alternating least squares approach , 2004 .

[48]  Åsmund Rinnan,et al.  Handling of first-order Rayleigh scatter in PARAFAC modelling of fluorescence excitation–emission data , 2005 .