UvA-DARE ( Digital Academic Repository ) Separating common from distinctive variation

Background: Joint and individual variation explained (JIVE), distinct and common simultaneous component analysis (DISCO) and O2-PLS, a two-block (X-Y) latent variable regression method with an integral OSC filter can all be used for the integrated analysis of multiple data sets and decompose them in three terms: a low(er)-rank approximation capturing common variation across data sets, low(er)-rank approximations for structured variation distinctive for each data set, and residual noise. In this paper these three methods are compared with respect to their mathematical properties and their respective ways of defining common and distinctive variation. Results: The methods are all applied on simulated data and mRNA and miRNA data-sets from GlioBlastoma Multiform (GBM) brain tumors to examine their overlap and differences. When the common variation is abundant, all methods are able to find the correct solution. With real data however, complexities in the data are treated differently by the three methods. Conclusions: All three methods have their own approach to estimate common and distinctive variation with their specific strength and weaknesses. Due to their orthogonality properties and their used algorithms their view on the data is slightly different. By assuming orthogonality between common and distinctive, true natural or biological phenomena that may not be orthogonal at all might be misinterpreted.

[1]  Forest Rohwer,et al.  Mass spectral similarity for untargeted metabolomics data analysis of complex mixtures. , 2015, International journal of mass spectrometry.

[2]  Enrico Petretto,et al.  Multi-tissue Analysis of Co-expression Networks by Higher-Order Generalized Singular Value Decomposition Identifies Functionally Coherent Transcriptional Modules , 2014, PLoS genetics.

[3]  Tom F. Wilderjans,et al.  Performing DISCO-SCA to search for distinctive and common information in linked data , 2013, Behavior Research Methods.

[4]  I. Mechelen,et al.  Identifying common and distinctive processes underlying multiset data , 2013 .

[5]  Tommy Löfstedt,et al.  Global, local and unique decompositions in OnPLS for multiblock data analysis. , 2013, Analytica chimica acta.

[6]  E. Qannari,et al.  Deflation strategies for multi-block principal component analysis revisited , 2013 .

[7]  Eric F Lock,et al.  JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES. , 2011, The annals of applied statistics.

[8]  L. De Lathauwer,et al.  DISCO-SCA and Properly Applied GSVD as Swinging Methods to Find Common and Distinctive Processes , 2012, PloS one.

[9]  Eva Ceulemans,et al.  Clusterwise simultaneous component analysis for analyzing structural differences in multivariate multiblock data. , 2012, Psychological methods.

[10]  Tommy Löfstedt,et al.  OnPLS : Orthogonal projections to latent structures in multiblock and path model data analysis , 2012 .

[11]  O. Alter,et al.  A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression from Multiple Organisms , 2011, PloS one.

[12]  Tommy Löfstedt,et al.  OnPLS—a novel multiblock method for the modelling of predictive and orthogonal variation , 2011 .

[13]  Peter D. Wentzell,et al.  Interpretation of analysis of variance models using principal component analysis to assess the effect of a maternal anticancer treatment on the mineralization of rat bones. , 2011, Analytica chimica acta.

[14]  Age K. Smilde,et al.  Real-life metabolomics data analysis : how to deal with complex data ? , 2010 .

[15]  Chuen Seng Tan,et al.  Correlating gene and protein expression data using Correlated Factor Analysis , 2009, BMC Bioinformatics.

[16]  Daniel Eriksson,et al.  Data integration in plant biology: the O2PLS method for combined modeling of transcript and metabolite data. , 2007, The Plant journal : for cell and molecular biology.

[17]  John A. Berger,et al.  Jointly analyzing gene expression and copy number data in breast cancer using data reduction models , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  D. Botstein,et al.  Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Johan Trygg,et al.  O2‐PLS, a two‐block (X–Y) latent variable regression (LVR) method with an integral OSC filter , 2003 .

[20]  J. Trygg O2‐PLS for qualitative and quantitative analysis in multivariate calibration , 2002 .

[21]  K. Bollen Latent variables in psychology and the social sciences. , 2002, Annual review of psychology.

[22]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[23]  A. Smilde,et al.  Deflation in multiblock PLS , 2001 .

[24]  I T Joliffe,et al.  Principal component analysis and exploratory factor analysis , 1992, Statistical methods in medical research.