P-value evaluation, variability index and biomarker categorization for adaptively weighted Fisher's meta-analysis method in omics applications

MOTIVATION Meta-analysis methods have been widely used to combine results from multiple clinical or genomic studies to increase statistical powers and ensure robust and accurate conclusions. The adaptively weighted Fisher's method (AW-Fisher), initially developed for omics applications but applicable for general meta-analysis, is an effective approach to combine p-values from K independent studies and to provide better biological interpretability by characterizing which studies contribute to the meta-analysis. Currently, AW-Fisher suffers from the lack of fast p-value computation and variability estimate of AW weights. When the number of studies K is large, the 3K- 1 possible differential expression pattern categories generated by AW-Fisher can become intractable. In this paper, we develop an importance sampling scheme with spline interpolation to increase the accuracy and speed of the p-value calculation. We also apply bootstrapping to construct a variability index for the AW-Fisher weight estimator and a co-membership matrix to categorize (cluster) differentially expressed genes based on their meta-patterns for intuitive biological investigations. RESULTS The superior performance of the proposed methods is shown in simulations as well as two real omics meta-analysis applications to demonstrate its insightful biological findings. AVAILABILITY An R package AWFisher (calling C ++) is available at Bioconductor and GitHub (https://github.com/Caleb-Huo/AWFisher), and all datasets and programming codes for this paper are available in the supplementary materials. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[2]  Allan Birnbaum,et al.  Combining Independent Tests of Significance , 1954 .

[3]  Shaolin Wang,et al.  Transcriptome Sequencing of Gene Expression in the Brain of the HIV-1 Transgenic Rat , 2013, PloS one.

[4]  R. Simon,et al.  Development and validation of therapeutically relevant multi-gene biomarker classifiers. , 2005, Journal of the National Cancer Institute.

[5]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for GWAS meta-analysis , 2012, Nucleic acids research.

[6]  George C. Tseng,et al.  Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline , 2013, BMC Bioinformatics.

[7]  Darlene R. Goldstein,et al.  Meta-analysis and Combining Information in Genetics and Genomics , 2009 .

[8]  Jia Li,et al.  An adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies , 2011, 1108.3180.

[9]  L. H. C. Tippett The Methods of Statistics. , 1931 .

[10]  Y. Benjamini,et al.  Screening for Partial Conjunction Hypotheses , 2008, Biometrics.

[11]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for microarray meta-analysis , 2012, Nucleic acids research.

[12]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[13]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[14]  Eytan Domany,et al.  Using high-throughput transcriptomic data for prognosis: a critical overview and perspectives. , 2014, Cancer research.

[15]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[16]  S. N. Roy On a Heuristic Method of Test Construction and its use in Multivariate Analysis , 1953 .

[17]  Douglas G Altman,et al.  Key Issues in Conducting a Meta-Analysis of Gene Expression Microarray Datasets , 2008, PLoS medicine.

[18]  Ramon C. Littell,et al.  Asymptotic Optimality of Fisher's Method of Combining Independent Tests , 1971 .

[19]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[20]  E. Suchman,et al.  The American Soldier: Adjustment During Army Life. , 1949 .

[21]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[22]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[23]  Fred A. Wright,et al.  A geometric interpretation of the permutation $p$-value and its application in eQTL studies , 2010, 1011.2295.

[24]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[25]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[26]  George C Tseng,et al.  HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS. , 2014, The annals of applied statistics.

[27]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[28]  George C Tseng,et al.  Tight Clustering: A Resampling‐Based Approach for Identifying Stable and Tight Patterns in Data , 2005, Biometrics.

[29]  Claire Duvallet,et al.  Correcting for batch effects in case-control microbiome studies , 2018, bioRxiv.

[30]  M. Radmacher,et al.  Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. , 2003, Journal of the National Cancer Institute.

[31]  Yan Lin,et al.  An R package suite for microarray meta-analysis in quality control, differentially expressed gene analysis and pathway enrichment detection , 2012, Bioinform..

[32]  Zhiguang Huo,et al.  BAYESIAN LATENT HIERARCHICAL MODEL FOR TRANSCRIPTOMIC META-ANALYSIS TO DETECT BIOMARKERS WITH CLUSTERED META-PATTERNS OF DIFFERENTIAL EXPRESSION SIGNALS. , 2017, The annals of applied statistics.