Using the ratio of means as the effect size measure in combining results of microarray experiments

BackgroundDevelopment of efficient analytic methodologies for combining microarray results is a major challenge in gene expression analysis. The widely used effect size models are thought to provide an efficient modeling framework for this purpose, where the measures of association for each study and each gene are combined, weighted by the standard errors. A significant disadvantage of this strategy is that the quality of different data sets may be highly variable, but this information is usually neglected during the integration. Moreover, it is widely known that the estimated standard deviations are probably unstable in the commonly used effect size measures (such as standardized mean difference) when sample sizes in each group are small.ResultsWe propose a re-parameterization of the traditional mean difference based effect measure by using the log ratio of means as an effect size measure for each gene in each study. The estimated effect sizes for all studies were then combined under two modeling frameworks: the quality-unweighted random effects models and the quality-weighted random effects models. We defined the quality measure as a function of the detection p-value, which indicates whether a transcript is reliably detected or not on the Affymetrix gene chip. The new effect size measure is evaluated and compared under the quality-weighted and quality-unweighted data integration frameworks using simulated data sets, and also in several data sets of prostate cancer patients and controls. We focus on identifying differentially expressed biomarkers for prediction of cancer outcomes.ConclusionOur results show that the proposed effect size measure (log ratio of means) has better power to identify differentially expressed genes, and that the detected genes have better performance in predicting cancer outcomes than the commonly used effect size measure, the standardized mean difference (SMD), under both quality-weighted and quality-unweighted data integration frameworks. The new effect size measure and the quality-weighted microarray data integration framework provide efficient ways to combine microarray results.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[3]  Andrew B. Nobel,et al.  Merging two gene-expression studies via cross-platform normalization , 2008, Bioinform..

[4]  Sangsoo Kim,et al.  Combining multiple microarray studies and modeling interstudy variation , 2003, ISMB.

[5]  Stephen C. Harris,et al.  Rat toxicogenomic study reveals analytical consistency across microarray platforms , 2006, Nature Biotechnology.

[6]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[8]  Beate Sick,et al.  Quality assessment of Affymetrix GeneChip data. , 2006, Omics : a journal of integrative biology.

[9]  Joseph Beyene,et al.  Integrative analysis of multiple gene expression profiles with quality-adjusted effect size models , 2005, BMC Bioinformatics.

[10]  S. Falcon,et al.  Combining Results of Microarray Experiments: A Rank Aggregation Approach , 2006, Statistical applications in genetics and molecular biology.

[11]  D J Spiegelhalter,et al.  Bayesian approaches to random-effects meta-analysis: a comparative study. , 1995, Statistics in medicine.

[12]  Joseph Beyene,et al.  Tests for differential gene expression using weights in oligonucleotide microarray experiments , 2006, BMC Genomics.

[13]  Jing Wang,et al.  Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer , 2006, Cancer informatics.

[14]  G. Oehlert A note on the delta method , 1992 .

[15]  Joseph Beyene,et al.  Statistical Methods for Meta-Analysis of Microarray Data: A Comparative Study , 2006, Inf. Syst. Frontiers.

[16]  E. Latulippe,et al.  Comprehensive gene expression analysis of prostate cancer reveals distinct transcriptional programs associated with metastatic disease. , 2002, Cancer research.

[17]  John R. Stevens,et al.  Combining Affymetrix microarray results , 2005, BMC Bioinformatics.

[18]  S. Dhanasekaran,et al.  The polycomb group protein EZH2 is involved in progression of prostate cancer , 2002, Nature.

[19]  Jun Chen,et al.  Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes , 2004, BMC Bioinformatics.

[20]  J. Welsh,et al.  Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. , 2001, Cancer research.

[21]  Jeffrey E. Harris,et al.  Bayes Methods for Combining the Results of Cancer Studies in Humans and other Species , 1983 .

[22]  Daniel Q. Naiman,et al.  Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data , 2005, Bioinform..

[23]  Joseph Beyene,et al.  Integrative Analysis of Gene Expression Data Including an Assessment of Pathway Enrichment for Predicting Prostate Cancer , 2006, Cancer informatics.

[24]  P. Brown,et al.  Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[26]  I. Yang,et al.  Multi-platform, multi-site, microarray-based human tumor classification. , 2004, The American journal of pathology.

[27]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[28]  Weida Tong,et al.  Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data , 2005, Nucleic acids research.

[29]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[30]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[31]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[32]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[33]  Giovanni Parmigiani,et al.  A Cross-Study Comparison of Gene Expression Studies for the Molecular Classification of Lung Cancer , 2004, Clinical Cancer Research.

[34]  Xiao-Hua Zhou,et al.  Statistical Methods for Meta‐Analysis , 2008 .

[35]  J. Wang-Rodriguez,et al.  In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[37]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[38]  L. Hedges,et al.  The Handbook of Research Synthesis , 1995 .

[39]  B. Conley,et al.  Detection of Prostate Cancer and Predicting Progression , 2004, Clinical Cancer Research.

[40]  D Tritchler,et al.  Modelling study quality in meta-analysis. , 1999, Statistics in medicine.

[41]  Ruth Etzioni,et al.  Combining Results of Microarray Experiments: A Rank Aggregation Approach , 2006 .

[42]  T. Barrette,et al.  Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. , 2002, Cancer research.

[43]  Roland Eils,et al.  Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes , 2005, BMC Bioinformatics.