Two-way analysis of high-dimensional collinear data

We present a Bayesian model for two-way ANOVA-type analysis of high-dimensional, small sample-size datasets with highly correlated groups of variables. Modern cellular measurement methods are a main application area; typically the task is differential analysis between diseased and healthy samples, complicated by additional covariates requiring a multi-way analysis. The main complication is the combination of high dimensionality and low sample size, which renders classical multivariate techniques useless. We introduce a hierarchical model which does dimensionality reduction by assuming that the input variables come in similarly-behaving groups, and performs an ANOVA-type decomposition for the set of reduced-dimensional latent variables. We apply the methods to study lipidomic profiles of a recent large-cohort human diabetes study.

[1]  G. Celeux,et al.  Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments , 2005 .

[2]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[3]  Olli Simell,et al.  Gender-dependent progression of systemic metabolic states in early childhood , 2008, Molecular systems biology.

[4]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[5]  Ralf Steuer,et al.  Review: On the analysis and interpretation of correlations in metabolomic data , 2006, Briefings Bioinform..

[6]  Age K. Smilde,et al.  Statistical validation of megavariate effects in ASCA , 2007, BMC Bioinformatics.

[7]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[8]  Olli Simell,et al.  Dysregulation of lipid and amino acid metabolism precedes islet autoimmunity in children who later progress to type 1 diabetes , 2008, The Journal of experimental medicine.

[9]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[10]  Ø. Langsrud,et al.  50–50 multivariate analysis of variance for collinear responses , 2002 .

[11]  Age K. Smilde,et al.  ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data , 2005, Bioinform..

[12]  Daniel B. Rowe On Estimating the Mean in Bayesian Factor Analysis , 2000 .

[13]  Age K. Smilde,et al.  UvA-DARE ( Digital Academic Repository ) Assessment of PLSDA cross validation , 2008 .

[14]  Charles A. Bouman,et al.  Covariance Estimation for High Dimensional Data Vectors Using the Sparse Matrix Transform , 2008, NIPS.

[15]  Wei Pan,et al.  Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data , 2007, Bioinform..

[16]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[17]  Guido Sanguinetti,et al.  MMG: a probabilistic tool to identify submodules of metabolic pathways , 2008, Bioinform..

[18]  Matthew J. Beal,et al.  Gene Expression Time Course Clustering with Countably Infinite Hidden Markov Models , 2006, UAI.

[19]  Christopher M. Bishop,et al.  Bayesian PCA , 1998, NIPS.

[20]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[21]  Francis R. Bach,et al.  Sparse probabilistic projections , 2008, NIPS.

[22]  Pascal J. Goldschmidt-Clermont,et al.  Of mice and men: Sparse statistical modeling in cardiovascular genomics , 2007, 0709.0165.

[23]  Kui Wang,et al.  A Mixture model with random-effects components for clustering correlated gene-expression profiles , 2006, Bioinform..

[24]  Bing Zhang,et al.  An Integrated Approach for the Analysis of Biological Pathways using Mixed Models , 2008, PLoS genetics.