Exploring dynamic metabolomics data with multiway data analysis: a simulation study

Background: Analysis of dynamic metabolomics data holds the promise to improve our understanding of underlying mechanisms in metabolism. For example, it may detect changes in metabolism due to the onset of a disease. Dynamic or time-resolved metabolomics data can be arranged as a three-way array with entries organized according to a subjects mode, a metabolites mode and a time mode. While such time-evolving multiway data sets are increasingly collected, revealing the underlying mechanisms and their dynamics from such data remains challenging. For such data, one of the complexities is the presence of a superposition of several sources of variation: induced variation (due to experimental conditions or inborn errors), individual variation, and measurement error. Multiway data analysis (also known as tensor factorizations) has been successfully used in data mining to find the underlying patterns in multiway data. In this paper, we study the use of multiway data analysis to reveal the underlying patterns and dynamics in time-resolved metabolomics data. Results: We focus on simulated data arising from different dynamic models of increasing complexity, i.e., a simple linear system, a yeast glycolysis model, and a human cholesterol model. We generate data with induced variation as well as individual variation. Systematic experiments are performed to demonstrate the advantages and limitations of multiway data analysis in analyzing such dynamic metabolomics data and their capacity to disentangle the different sources of variations. We choose to use simulations since we want to understand the capability of multiway data analysis methods which is facilitated by knowing the ground truth. Conclusion: Our numerical experiments demonstrate that despite the increasing complexity of the studied dynamic metabolic models, tensor factorization methods CANDECOMP/PARAFAC(CP) and Parallel Profiles with Linear Dependences (Paralind) can disentangle the sources of variations and thereby reveal the underlying mechanisms and their dynamics.

[1]  R. Bro,et al.  A new efficient method for determining the number of components in PARAFAC models , 2003 .

[2]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[3]  Rasmus Bro,et al.  PARAFASCA: ASCA combined with PARAFAC for the analysis of metabolic fingerprinting data , 2008 .

[4]  Jimeng Sun,et al.  LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values , 2020, KDD.

[5]  F. L. Hitchcock The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .

[6]  R. Harshman,et al.  Modeling multi‐way data with linearly dependent loadings , 2009 .

[7]  John C. Earls,et al.  A wellness study of 108 individuals using personal, dense, dynamic data clouds , 2017, Nature Biotechnology.

[8]  Rasmus Bro,et al.  The N-way Toolbox for MATLAB , 2000 .

[9]  Age K. Smilde,et al.  Constrained three‐mode factor analysis as a tool for parameter estimation with second‐order instrumental data , 1998 .

[10]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[11]  R. Bro,et al.  Centering and scaling in component analysis , 2003 .

[12]  R. Bro,et al.  PARAFAC and missing values , 2005 .

[13]  R. Bro PARAFAC. Tutorial and applications , 1997 .

[14]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[15]  Plasma , 2019, Reactions Weekly.

[16]  Ian Davidson,et al.  Network discovery via constrained tensor analysis of fMRI data , 2013, KDD.

[17]  Qiao Zhang,et al.  Discovering Temporal Patterns in Longitudinal Nontargeted Metabolomics Data via Group and Nuclear Norm Regularized Multivariate Regression , 2020, Metabolites.

[18]  A. K. Smilde,et al.  Dynamic metabolomic data analysis: a tutorial review , 2009, Metabolomics.

[19]  Meike T. Wortel,et al.  Lost in Transition: Start-Up of Glycolysis Yields Subpopulations of Nongrowing Cells , 2014, Science.

[20]  Ben van Ommen,et al.  A physiologically based in silico kinetic model predicting plasma cholesterol concentrations in humans[S] , 2012, Journal of Lipid Research.

[21]  Rasmus Bro,et al.  Data Fusion in Metabolomics Using Coupled Matrix and Tensor Factorizations , 2015, Proceedings of the IEEE.

[22]  Hiromu Ohno,et al.  Dimensionality reduction for metabolome data using PCA, PLS, OPLS, and RFDA with differential penalties to latent variables , 2009 .

[23]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[24]  David E. Booth,et al.  Multi-Way Analysis: Applications in the Chemical Sciences , 2005, Technometrics.

[25]  Isobel Claire Gormley,et al.  A dynamic probabilistic principal components model for the analysis of longitudinal metabolomics data , 2013, 1312.2393.

[26]  Tamara G. Kolda,et al.  Scalable Tensor Factorizations for Incomplete Data , 2010, ArXiv.

[27]  Rasmus Bro,et al.  Multiway analysis of epilepsy tensors , 2007, ISMB/ECCB.

[28]  Takoua Jendoubi,et al.  Integrative analysis of time course metabolic data and biomarker discovery , 2018, BMC Bioinformatics.

[29]  Erik J. Saude,et al.  Variation of metabolites in normal human urine , 2007, Metabolomics.

[30]  Christos Faloutsos,et al.  FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop , 2014, SDM.

[31]  Bülent Yener,et al.  Unsupervised Multiway Data Analysis: A Literature Survey , 2009, IEEE Transactions on Knowledge and Data Engineering.

[32]  Daniel M. Dunlavy,et al.  A scalable optimization approach for fitting canonical tensor decompositions , 2011 .

[33]  Age K. Smilde,et al.  Metabolic fate of polyphenols in the human superorganism , 2010, Proceedings of the National Academy of Sciences.

[34]  D Weuster-Botz,et al.  Automated sampling device for monitoring intracellular metabolite dynamics. , 1999, Analytical biochemistry.

[35]  R. Kleemann,et al.  Plasma metabolomics and proteomics profiling after a postprandial challenge reveal subtle diet effects on human metabolic status , 2011, Metabolomics.

[36]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[37]  Philip S. Yu,et al.  Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[38]  A. Stegeman Degeneracy in Candecomp/Parafac and Indscal Explained For Several Three-Sliced Arrays With A Two-Valued Typical Rank , 2007, Psychometrika.

[39]  A. Smilde,et al.  New figures of merit for comprehensive functional genomics data: the metabolomics case. , 2011, Analytical chemistry.

[40]  Rasmus Bro,et al.  Improving the speed of multiway algorithms: Part II: Compression , 1998 .

[41]  Tulay Adali,et al.  Tracing Network Evolution Using The Parafac2 Model , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  Tamara G. Kolda,et al.  Temporal Link Prediction Using Matrix and Tensor Factorizations , 2010, TKDD.

[43]  Nikos D. Sidiropoulos,et al.  Tensors for Data Mining and Data Fusion , 2016, ACM Trans. Intell. Syst. Technol..

[44]  M. V. van Erk,et al.  Multi-parameter comparison of a standardized mixed meal tolerance test in healthy and type 2 diabetic subjects: the PhenFlex challenge , 2017, Genes & Nutrition.

[45]  Michael W. Berry,et al.  Discussion Tracking in Enron Email using PARAFAC. , 2008 .