Probabilistic prioritization of candidate pathway association with pathway score

BackgroundCurrent methods for gene-set or pathway analysis are usually designed to test the enrichment of a single gene-set. Once the analysis is carried out for each of the sets under study, a list of significant sets can be obtained. However, if one wishes to further prioritize the importance or strength of association of these sets, no such quantitative measure is available. Using the magnitude of p-value to rank the pathways may not be appropriate because p-value is not a measure for strength of significance. In addition, when testing each pathway, these analyses are often implicitly affected by the number of differentially expressed genes included in the set and/or affected by the dependence among genes.ResultsHere we propose a two-stage procedure to prioritize the pathways/gene-sets. In the first stage we develop a pathway-level measure with three properties. First, it contains all genes (differentially expressed or not) in the same set, and summarizes the collective effect of all genes per sample. Second, this pathway score accounts for the correlation between genes by synchronizing their correlation directions. Third, the score includes a rank transformation to enhance the variation among samples as well as to avoid the influence of extreme heterogeneity among genes. In the second stage, all scores are included simultaneously in a Bayesian logistic regression model which can evaluate the strength of association for each set and rank the sets based on posterior probabilities. Simulations from Gaussian distributions and human microarray data, and a breast cancer study with RNA-Seq are considered for demonstration and comparison with other existing methods.ConclusionsThe proposed summary pathway score provides for each sample an overall evaluation of gene expression in a gene-set. It demonstrates the advantages of including all genes in the set and the synchronization of correlation direction. The simultaneous utilization of all pathway-level scores in a Bayesian model not only offers a probabilistic evaluation and ranking of the pathway association but also presents good accuracy in identifying the top-ranking pathways. The resulting recommendation list of ranked pathways can be a reference for potential target therapy or for future allocation of research resources.

[1]  H. Ikeda,et al.  Interleukin-17D mediates tumor rejection through recruitment of natural killer cells. , 2014, Cell reports.

[2]  M. Artés Statistical errors. , 1977, Medicina clinica.

[3]  Korbinian Strimmer,et al.  BMC Bioinformatics BioMed Central Methodology article A general modular framework for gene set enrichment analysis , 2009 .

[4]  Frank Emmert-Streib,et al.  Comparative evaluation of gene set analysis approaches for RNA-Seq data , 2014, BMC Bioinformatics.

[5]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[6]  J. Dopazo,et al.  Gene set internal coherence in the context of functional profiling , 2009, BMC Genomics.

[7]  Frank Emmert-Streib,et al.  Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline , 2015, Briefings Bioinform..

[8]  P. Khatri,et al.  A systems biology approach for pathway level analysis. , 2007, Genome research.

[9]  T. Heskes,et al.  The statistical properties of gene-set analysis , 2016, Nature Reviews Genetics.

[10]  Regina Nuzzo,et al.  Scientific method: Statistical errors , 2014, Nature.

[11]  Frank Emmert-Streib,et al.  Unite and conquer: univariate and multivariate approaches for finding differentially expressed gene sets , 2009, Bioinform..

[12]  A. Nobel,et al.  Heading Down the Wrong Pathway: on the Influence of Correlation within Gene Sets , 2010, BMC Genomics.

[13]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[14]  D. Tiezzi,et al.  IL17 Promotes Mammary Tumor Progression by Changing the Behavior of Tumor Cells and Eliciting Tumorigenic Neutrophils Recruitment. , 2015, Cancer research.

[15]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  R. Hromas,et al.  Cutting Edge: IL-17D, a Novel Member of the IL-17 Family, Stimulates Cytokine Production and Inhibits Hemopoiesis1 , 2002, The Journal of Immunology.

[17]  Christer Larsson,et al.  Correction: Endothelial ALK1 Is a Therapeutic Target to Block Metastatic Dissemination of Breast Cancer. , 2016, Cancer research.

[18]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[19]  Y. Feng,et al.  Gene-set Analysis with CGI Information for Differential DNA Methylation Profiling , 2016, Scientific Reports.

[20]  Ana Lluch,et al.  Molecular biology in breast cancer: intrinsic subtypes and signaling pathways. , 2012, Cancer treatment reviews.

[21]  Henryk Maciejewski,et al.  Gene set analysis methods: statistical models and methodological differences , 2013, Briefings Bioinform..

[22]  Andrew B. Nobel,et al.  Significance analysis of functional categories in gene expression studies: a structured permutation approach , 2005, Bioinform..

[23]  A. Sahin,et al.  A Molecular Portrait of High-Grade Ductal Carcinoma In Situ. , 2015, Cancer research.