Comprehensive analysis of correlation coefficients estimated from pooling heterogeneous microarray data

BackgroundThe synthesis of information across microarray studies has been performed by combining statistical results of individual studies (as in a mosaic), or by combining data from multiple studies into a large pool to be analyzed as a single data set (as in a melting pot of data). Specific issues relating to data heterogeneity across microarray studies, such as differences within and between labs or differences among experimental conditions, could lead to equivocal results in a melting pot approach.ResultsWe applied statistical theory to determine the specific effect of different means and heteroskedasticity across 19 groups of microarray data on the sign and magnitude of gene-to-gene Pearson correlation coefficients obtained from the pool of 19 groups. We quantified the biases of the pooled coefficients and compared them to the biases of correlations estimated by an effect-size model. Mean differences across the 19 groups were the main factor determining the magnitude and sign of the pooled coefficients, which showed largest values of bias as they approached ±1. Only heteroskedasticity across the pool of 19 groups resulted in less efficient estimations of correlations than did a classical meta-analysis approach of combining correlation coefficients. These results were corroborated by simulation studies involving either mean differences or heteroskedasticity across a pool of N > 2 groups.ConclusionsThe combination of statistical results is best suited for synthesizing the correlation between expression profiles of a gene pair across several microarray studies.

[1]  Uwe Hassler,et al.  Nonsensical and biased correlation due to pooling heterogeneous samples , 2003 .

[2]  Willem A Rensink,et al.  Statistical issues in microarray data analysis. , 2006, Methods in molecular biology.

[3]  Zhaolei Zhang,et al.  MAID : An effect size based model for microarray data integration across laboratories and platforms , 2008, BMC Bioinformatics.

[4]  David J. Hand,et al.  How to lie with bad data , 2005 .

[5]  C. Shelton,et al.  Annotating Genes of Known and Unknown Function by Large-Scale Coexpression Analysis1[W][OA] , 2008, Plant Physiology.

[6]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[7]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[8]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[9]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[10]  Eve Syrkin Wurtele,et al.  Regulon organization of Arabidopsis , 2008, BMC Plant Biology.

[11]  M. Webster,et al.  Correlation analysis between genome-wide expression profiles and cytoarchitectural abnormalities in the prefrontal cortex of psychiatric disorders , 2010, Molecular Psychiatry.

[12]  Márcia M. Almeida-de-Macedo,et al.  Massive Human Co‐Expression Network and Its Medical Applications , 2012, Chemistry & biodiversity.

[13]  N. Wrigley,et al.  Statistical applications in the spatial sciences , 1981 .

[14]  G. W. Snedecor Statistical Methods , 1964 .

[15]  S. Openshaw A million or so correlation coefficients : three experiments on the modifiable areal unit problem , 1979 .

[16]  E. Bornberg-Bauer,et al.  The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. , 2007, The Plant journal : for cell and molecular biology.

[17]  R. Fisher FREQUENCY DISTRIBUTION OF THE VALUES OF THE CORRELATION COEFFIENTS IN SAMPLES FROM AN INDEFINITELY LARGE POPU;ATION , 1915 .

[18]  P. Brown,et al.  Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Sarah E. Brockwell,et al.  A comparison of statistical methods for meta‐analysis , 2001, Statistics in medicine.

[20]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[21]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[22]  J. Wisell,et al.  Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures , 2010 .

[23]  H. E. Soper On the Probable Error of the Correlation Coefficient to a Second Approximation , 1913 .

[24]  C. Blyth On Simpson's Paradox and the Sure-Thing Principle , 1972 .

[25]  Giovanni Parmigiani,et al.  A Cross-Study Comparison of Gene Expression Studies for the Molecular Classification of Lung Cancer , 2004, Clinical Cancer Research.

[26]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[27]  Joseph Beyene,et al.  Integrative analysis of multiple gene expression profiles with quality-adjusted effect size models , 2005, BMC Bioinformatics.

[28]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[29]  John Quackenbush,et al.  Multiple-laboratory comparison of microarray platforms , 2005, Nature Methods.

[30]  Darlene R. Goldstein,et al.  Comparison of Meta-analysis to Combined Analysis of a Replicated Microarray Study , 2005 .

[31]  L. Hedges,et al.  Statistical Methods for Meta-Analysis , 1987 .

[32]  C. E. Gehlke,et al.  Certain Effects of Grouping upon the Size of the Correlation Coefficient in Census Tract Material , 1934 .

[33]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[34]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[35]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[36]  A. Field Meta-analysis of correlation coefficients: a Monte Carlo comparison of fixed- and random-effects methods. , 2001, Psychological methods.

[37]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .