Quality Optimised Analysis of General Paired Microarray Experiments

In microarray experiments, several steps may cause sub-optimal quality and the need for quality control is strong. Often the experiments are complex, with several conditions studied simultaneously. A linear model for paired microarray experiments is proposed as a generalisation of the paired two-sample method by Kristiansson et al. (2005). Quality variation is modelled by different variance scales for different (pairs of) arrays, and shared sources of variation are modelled by covariances between arrays. The gene-wise variance estimates are moderated in an empirical Bayes approach. Due to correlations all data is typically used in the inference of any linear combination of parameters. Both real and simulated data are analysed. Unequal variances and strong correlations are found in real data, leading to further examination of the fit of the model and of the nature of the datasets in general. The empirical distributions of the test-statistics are found to have a considerably improved match to the null distribution compared to previous methods, which implies more correct p-values provided that most genes are non-differentially expressed. In fact, assuming independent observations with identical variances typically leads to optimistic p-values. The method is shown to perform better than the alternatives in the simulation study.

[1]  P. R. Nelson Continuous Univariate Distributions Volume 2 , 1996 .

[2]  N. L. Johnson,et al.  Continuous Univariate Distributions. , 1995 .

[3]  G. Churchill Fundamentals of experimental design for cDNA microarrays , 2002, Nature Genetics.

[4]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[5]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[6]  Kim Johnson,et al.  QA/QC as a pressing need for microarray analysis: meeting report from CAMDA'02. , 2003, BioTechniques.

[7]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[8]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[9]  Erik Kristiansson,et al.  Weighted Analysis of Paired Microarray Experiments , 2005, Statistical applications in genetics and molecular biology.

[10]  Xinqiang Han,et al.  Genomic profiling of the human heart before and after mechanical support with a ventricular assist device reveals alterations in vascular signaling networks. , 2004, Physiological genomics.

[11]  S. Dudoit,et al.  Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. , 2000, Genome research.

[12]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[13]  Y. L. Tong The multivariate normal distribution , 1989 .

[14]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[15]  Juan P Steibel,et al.  On Reference Designs For Microarray Experiments , 2005, Statistical applications in genetics and molecular biology.

[16]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[17]  Arjun K. Gupta The Theory of Linear Models and Multivariate Analysis , 1981 .

[18]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..