Meta-analysis for pathway enrichment analysis when combining multiple genomic studies

MOTIVATION Many pathway analysis (or gene set enrichment analysis) methods have been developed to identify enriched pathways under different biological states within a genomic study. As more and more microarray datasets accumulate, meta-analysis methods have also been developed to integrate information among multiple studies. Currently, most meta-analysis methods for combining genomic studies focus on biomarker detection and meta-analysis for pathway analysis has not been systematically pursued. RESULTS We investigated two approaches of meta-analysis for pathway enrichment (MAPE) by combining statistical significance across studies at the gene level (MAPE_G) or at the pathway level (MAPE_P). Simulation results showed increased statistical power of meta-analysis approaches compared to a single study analysis and showed complementary advantages of MAPE_G and MAPE_P under different scenarios. We also developed an integrated method (MAPE_I) that incorporates advantages of both approaches. Comprehensive simulations and applications to real data on drug response of breast cancer cell lines and lung cancer tissues were evaluated to compare the performance of three MAPE variations. MAPE_P has the advantage of not requiring gene matching across studies. When MAPE_G and MAPE_P show complementary advantages, the hybrid version of MAPE_I is generally recommended. AVAILABILITY http://www.biostat.pitt.edu/bioinfo/ CONTACT ctseng@pitt.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  A. Gardner Methods of Statistics , 1941 .

[2]  R V Jensen,et al.  Genome-wide expression profiling of human blood reveals biomarkers for Huntington's disease. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[3]  M. Newton,et al.  Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis , 2007, 0708.4350.

[4]  T. Barrette,et al.  Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. , 2002, Cancer research.

[5]  Roland Eils,et al.  Group testing for pathway analysis improves comparability of different microarray datasets , 2006, Bioinform..

[6]  Chen-An Tsai,et al.  Estimation of False Discovery Rates in Multiple Testing: Application to Gene Microarray Data , 2003, Biometrics.

[7]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Sangsoo Kim,et al.  Combining multiple microarray studies and modeling interstudy variation , 2003, ISMB.

[9]  Joseph P. Romano,et al.  Control of the false discovery rate under dependence using the bootstrap and subsampling , 2008 .

[10]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[11]  Douglas A. Hosack,et al.  Identifying biological themes within lists of genes with EASE , 2003, Genome Biology.

[12]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Youping Deng,et al.  GeneVenn - A web application for comparing gene lists using Venn diagrams , 2007, Bioinformation.

[14]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[15]  Andrew B. Nobel,et al.  A statistical framework for testing functional categories in microarray data , 2008, 0803.3881.

[16]  Debashis Ghosh,et al.  Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data , 2004, BMC Genomics.

[17]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[18]  R. Lempicki,et al.  Evaluation of gene expression measurements from commercial microarray platforms. , 2003, Nucleic acids research.

[19]  Hyungwon Choi,et al.  A Latent Variable Approach for Meta-Analysis of Gene Expression Data from Multiple Microarray Experiments , 2007, BMC Bioinformatics.

[20]  Allan Birnbaum,et al.  Combining Independent Tests of Significance , 1954 .

[21]  Alessio Farcomeni,et al.  More Powerful Control of the False Discovery Rate Under Dependence , 2006, Stat. Methods Appl..

[22]  D. Koller,et al.  A module map showing conditional activity of expression modules in cancer , 2004, Nature Genetics.

[23]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[24]  References , 1971 .

[25]  B WILKINSON,et al.  A statistical consideration in psychological research. , 1951, Psychological bulletin.

[26]  R Fodde,et al.  Expression and genomic profiling of colorectal cancer. , 2007, Biochimica et biophysica acta.

[27]  Mads Thomassen,et al.  Gene expression meta-analysis identifies metastatic pathways and transcription factors in breast cancer , 2008, BMC Cancer.

[28]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.

[29]  Robin Kirschbaum,et al.  Questions and answers , 2009, Diabetes, obesity & metabolism.

[30]  Thomas E. Royce,et al.  Integrative microarray analysis of pathways dysregulated in metastatic prostate cancer. , 2007, Cancer research.