论文信息 - Two-Way Analysis of High-Dimensional Collinear Data - 字舞流文

Two-Way Analysis of High-Dimensional Collinear Data

We present a Bayesian model for two-way ANOVA-type analysis of high-dimensional, small sample-size datasets with highly correlated groups of variables. Modern cellular measurement methods are a main application area; typically the task is differential analysis between diseased and healthy samples, complicated by additional covariates requiring a multi-way analysis. The main complication is the combination of high dimensionality and low sample size, which renders classical multivariate techniques useless. We introduce a hierarchical model which does dimensionality reduction by assuming that the input variables come in similarly-behaving groups, and performs an ANOVA-type decomposition for the set of reduced-dimensional latent variables. We apply the methods to study lipidomic profiles of a recent large-cohort human diabetes study.

Matej Oresic | Samuel Kaski | Ilkka Huopaniemi | Tommi Suvitaival | Janne Nikkilä

[1] Francis R. Bach,et al. Sparse probabilistic projections , 2008, NIPS.

[2] Bing Zhang,et al. An Integrated Approach for the Analysis of Biological Pathways using Mixed Models , 2008, PLoS genetics.

[3] Guido Sanguinetti,et al. MMG: a probabilistic tool to identify submodules of metabolic pathways , 2008, Bioinform..

[4] G. Celeux,et al. Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments , 2005 .

[5] Ralf Steuer,et al. Review: On the analysis and interpretation of correlations in metabolomic data , 2006, Briefings Bioinform..

[6] Y. Benjamini,et al. Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[7] Ø. Langsrud,et al. 50–50 multivariate analysis of variance for collinear responses , 2002 .

[8] Matthew J. Beal,et al. Gene Expression Time Course Clustering with Countably Infinite Hidden Markov Models , 2006, UAI.

[9] Wei Pan,et al. Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data , 2007, Bioinform..

[10] Kui Wang,et al. A Mixture model with random-effects components for clustering correlated gene-expression profiles , 2006, Bioinform..

[11] Olli Simell,et al. Dysregulation of lipid and amino acid metabolism precedes islet autoimmunity in children who later progress to type 1 diabetes , 2008, The Journal of experimental medicine.

[12] Daniel B. Rowe. On Estimating the Mean in Bayesian Factor Analysis , 2000 .

[13] Olli Simell,et al. Gender-dependent progression of systemic metabolic states in early childhood , 2008, Molecular systems biology.

[14] Age K. Smilde,et al. UvA-DARE ( Digital Academic Repository ) Assessment of PLSDA cross validation , 2008 .

[15] Pascal J. Goldschmidt-Clermont,et al. Of mice and men: Sparse statistical modeling in cardiovascular genomics , 2007, 0709.0165.

[16] Matej Oresic,et al. Two-way analysis of high-dimensional collinear data , 2009, Data Mining and Knowledge Discovery.

[17] Age K. Smilde,et al. Statistical validation of megavariate effects in ASCA , 2007, BMC Bioinformatics.

[18] Jilles Vreeken,et al. Identifying the components , 2009, Data Mining and Knowledge Discovery.

[19] Zoubin Ghahramani,et al. A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[20] Age K. Smilde,et al. ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data , 2005, Bioinform..

[21] Zoubin Ghahramani,et al. Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[22] A. Brix. Bayesian Data Analysis, 2nd edn , 2005 .