Assumption weighting for incorporating heterogeneity into meta-analysis of genomic data

MOTIVATION There is now a large literature on statistical methods for the meta-analysis of genomic data from multiple studies. However, a crucial assumption for performing many of these analyses is that the data exhibit small between-study variation or that this heterogeneity can be sufficiently modelled probabilistically. RESULTS In this article, we propose 'assumption weighting', which exploits a weighted hypothesis testing framework proposed by Genovese et al. to incorporate tests of between-study variation into the meta-analysis context. This methodology is fast and computationally simple to implement. Several weighting schemes are considered and compared using simulation studies. In addition, we illustrate application of the proposed methodology using data from several high-profile stem cell gene expression datasets.

[1]  M. Pellegrini,et al.  Molecular analyses of human induced pluripotent stem cells and embryonic stem cells. , 2010, Cell stem cell.

[2]  Giovanni Parmigiani,et al.  A Bayesian Model for Cross-Study Differential Gene Expression , 2009, Journal of the American Statistical Association.

[3]  T. Barrette,et al.  Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. , 2002, Cancer research.

[4]  Rainer Breitling,et al.  RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis , 2006, Bioinform..

[5]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[6]  G. Abecasis,et al.  A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants , 2007, Science.

[7]  Fuad G. Gwadry,et al.  Comparing cDNA and oligonucleotide array data: concordance of gene expression across platforms for the NCI-60 cancer cells , 2003, Genome Biology.

[8]  Carl Murie,et al.  A methodology for global validation of microarray experiments , 2006, BMC Bioinformatics.

[9]  Yinglei Lai,et al.  A mixture model approach to the tests of concordance and discordance between two large-scale experiments with two-sample groups , 2007, Bioinform..

[10]  Andrew B. Nobel,et al.  Merging two gene-expression studies via cross-platform normalization , 2008, Bioinform..

[11]  Jia Li,et al.  Biomarker detection in the integration of multiple multi-class genomic studies , 2010, Bioinform..

[12]  R. Collins,et al.  Newly identified loci that influence lipid concentrations and risk of coronary artery disease , 2008, Nature Genetics.

[13]  L. Hedges,et al.  Statistical Methods for Meta-Analysis , 1987 .

[14]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[15]  Mike J. Mason,et al.  Induced pluripotent stem cells and embryonic stem cells are distinguished by gene expression signatures. , 2009, Cell stem cell.

[16]  S. Normand,et al.  TUTORIAL IN BIOSTATISTICS META-ANALYSIS : FORMULATING , EVALUATING , COMBINING , AND REPORTING , 1999 .

[17]  S L Normand,et al.  Meta-analysis: formulating, evaluating, combining, and reporting. , 1999, Statistics in medicine.

[18]  Aaron M. Newman,et al.  Lab-specific gene expression signatures in pluripotent stem cells. , 2010, Cell stem cell.

[19]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.

[20]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[22]  S. Thompson,et al.  Quantifying heterogeneity in a meta‐analysis , 2002, Statistics in medicine.

[23]  N. Harris,et al.  The World Health Organization (WHO) classification of the myeloid neoplasms. , 2002, Blood.

[24]  Hyungwon Choi,et al.  A Latent Variable Approach for Meta-Analysis of Gene Expression Data from Multiple Microarray Experiments , 2007, BMC Bioinformatics.

[25]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[26]  Sangsoo Kim,et al.  Combining multiple microarray studies and modeling interstudy variation , 2003, ISMB.

[27]  Richard A Young,et al.  Chromatin structure and gene expression programs of human embryonic and induced pluripotent stem cells. , 2010, Cell stem cell.

[28]  Jean Yee Hwa Yang,et al.  Comparison study of microarray meta-analysis methods , 2010, BMC Bioinformatics.

[29]  L. Wasserman,et al.  False discovery control with p-value weighting , 2006 .

[30]  Peter J. Bickel,et al.  Measuring reproducibility of high-throughput experiments , 2011, 1110.4705.

[31]  W. G. Cochran The combination of estimates from different experiments. , 1954 .

[32]  Giovanni Parmigiani,et al.  A Cross-Study Comparison of Gene Expression Studies for the Molecular Classification of Lung Cancer , 2004, Clinical Cancer Research.