Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline

BackgroundAs high-throughput genomic technologies become accurate and affordable, an increasing number of data sets have been accumulated in the public domain and genomic information integration and meta-analysis have become routine in biomedical research. In this paper, we focus on microarray meta-analysis, where multiple microarray studies with relevant biological hypotheses are combined in order to improve candidate marker detection. Many methods have been developed and applied in the literature, but their performance and properties have only been minimally investigated. There is currently no clear conclusion or guideline as to the proper choice of a meta-analysis method given an application; the decision essentially requires both statistical and biological considerations.ResultsWe performed 12 microarray meta-analysis methods for combining multiple simulated expression profiles, and such methods can be categorized for different hypothesis setting purposes: (1) HSA: DE genes with non-zero effect sizes in all studies, (2) HSB: DE genes with non-zero effect sizes in one or more studies and (3) HSr: DE gene with non-zero effect in "majority" of studies. We then performed a comprehensive comparative analysis through six large-scale real applications using four quantitative statistical evaluation criteria: detection capability, biological association, stability and robustness. We elucidated hypothesis settings behind the methods and further apply multi-dimensional scaling (MDS) and an entropy measure to characterize the meta-analysis methods and data structure, respectively.ConclusionsThe aggregated results from the simulation study categorized the 12 methods into three hypothesis settings (HSA, HSB, and HSr). Evaluation in real data and results from MDS and entropy analyses provided an insightful and practical guideline to the choice of the most suitable method in a given application. All source files for simulation and real data are available on the author’s publication website.

[1]  Luigi Salmaso,et al.  Permutation Tests for Complex Data , 2010 .

[2]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[3]  S. Stouffer A study of attitudes. , 1949, Scientific American.

[4]  Naftali Kaminski,et al.  MetaQC: objective quality control and inclusion/exclusion criteria for genomic meta-analysis , 2011, Nucleic acids research.

[5]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for GWAS meta-analysis , 2012, Nucleic acids research.

[6]  Art B. Owen,et al.  Karl Pearson’s meta analysis revisited , 2009, 0911.3531.

[7]  Jia Li,et al.  An adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies , 2011, 1108.3180.

[8]  Peter J. Bickel,et al.  Measuring reproducibility of high-throughput experiments , 2011, 1110.4705.

[9]  Rainer Breitling,et al.  A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments , 2008, Bioinform..

[10]  Sangsoo Kim,et al.  Combining multiple microarray studies and modeling interstudy variation , 2003, ISMB.

[11]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[12]  Peter J Park,et al.  Meta-analysis of glioblastoma multiforme versus anaplastic astrocytoma identifies robust gene markers , 2009, Molecular Cancer.

[13]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[14]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for microarray meta-analysis , 2012, Nucleic acids research.

[15]  Leonard Henry Caleb Tippett,et al.  The Methods of Statistics. An introduction mainly for workers in the biological sciences. , 1932 .

[16]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[17]  BMC Bioinformatics , 2005 .

[18]  RAINER BREITLING,et al.  Rank-based Methods as a Non-parametric Alternative of the T-statistic for the Analysis of Biological Microarray Data , 2005, J. Bioinform. Comput. Biol..

[19]  Yan Lin,et al.  An R package suite for microarray meta-analysis in quality control, differentially expressed gene analysis and pathway enrichment detection , 2012, Bioinform..

[20]  Wei Wu,et al.  Comparison of normalization methods for CodeLink Bioarray data , 2005, BMC Bioinformatics.

[21]  Douglas G Altman,et al.  Key Issues in Conducting a Meta-Analysis of Gene Expression Microarray Datasets , 2008, PLoS medicine.

[22]  D.,et al.  Regression Models and Life-Tables , 2022 .

[23]  Rainer Breitling,et al.  RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis , 2006, Bioinform..

[24]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[25]  M. Cugmas,et al.  On comparing partitions , 2015 .

[26]  George C Tseng,et al.  HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS. , 2014, The annals of applied statistics.

[27]  B WILKINSON,et al.  A statistical consideration in psychological research. , 1951, Psychological bulletin.

[28]  N. Martin,et al.  Mathematical Theory of Entropy , 1981 .

[29]  Allan Birnbaum,et al.  Combining Independent Tests of Significance , 1954 .

[30]  Rainer Spang,et al.  Similarities of Ordered Gene Lists , 2006, J. Bioinform. Comput. Biol..

[31]  Jean Yee Hwa Yang,et al.  Comparison study of microarray meta-analysis methods , 2010, BMC Bioinformatics.

[32]  L. Salmaso,et al.  Permutation tests for complex data : theory, applications and software , 2010 .