Comparative study on gene set and pathway topology-based enrichment methods

BackgroundEnrichment analysis is a popular approach to identify pathways or sets of genes which are significantly enriched in the context of differentially expressed genes. The traditional gene set enrichment approach considers a pathway as a simple gene list disregarding any knowledge of gene or protein interactions. In contrast, the new group of so called pathway topology-based methods integrates the topological structure of a pathway into the analysis.MethodsWe comparatively investigated gene set and pathway topology-based enrichment approaches, considering three gene set and four topological methods. These methods were compared in two extensive simulation studies and on a benchmark of 36 real datasets, providing the same pathway input data for all methods. ResultsIn the benchmark data analysis both types of methods showed a comparable ability to detect enriched pathways. The first simulation study was conducted with KEGG pathways, which showed considerable gene overlaps between each other. In this study with original KEGG pathways, none of the topology-based methods outperformed the gene set approach. Therefore, a second simulation study was performed on non-overlapping pathways created by unique gene IDs. Here, methods accounting for pathway topology reached higher accuracy than the gene set methods, however their sensitivity was lower.ConclusionsWe conducted one of the first comprehensive comparative works on evaluating gene set against pathway topology-based enrichment methods. The topological methods showed better performance in the simulation scenarios with non-overlapping pathways, however, they were not conclusively better in the other scenarios. This suggests that simple gene set approach might be sufficient to detect an enriched pathway under realistic circumstances. Nevertheless, more extensive studies and further benchmark data are needed to systematically evaluate these methods and to assess what gain and cost pathway topology information introduces into enrichment analysis. Both types of methods for enrichment analysis require further improvements in order to deal with the problem of pathway overlaps.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[3]  Eric M Reiman,et al.  Gene expression profiles in anatomically and functionally distinct regions of the normal aged human brain. , 2007, Physiological genomics.

[4]  Greg Finak,et al.  Regulation of endocytosis via the oxygen-sensing pathway , 2009, Nature Medicine.

[5]  Zhen Jiang,et al.  Bioconductor Project Bioconductor Project Working Papers Year Paper Extensions to Gene Set Enrichment , 2013 .

[6]  Tim Beißbarth,et al.  rBiopaxParser - an R package to parse, modify and visualize BioPAX data , 2013, Bioinform..

[7]  Liviu Badea,et al.  Combined gene expression analysis of whole-tissue and microdissected pancreatic ductal adenocarcinoma identifies genes specifically overexpressed in tumor epithelia. , 2008, Hepato-gastroenterology.

[8]  P. Khatri,et al.  Profiling gene expression using onto-express. , 2002, Genomics.

[9]  G. Smyth,et al.  Camera: a competitive gene set test accounting for inter-gene correlation , 2012, Nucleic acids research.

[10]  W. Markesbery,et al.  Incipient Alzheimer's disease: Microarray correlation analyses reveal major transcriptional and tumor suppressor responses , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Manuel B. Graeber,et al.  PGC-1α, A Potential Therapeutic Target for Early Intervention in Parkinson’s Disease , 2010, Science Translational Medicine.

[12]  Cristina Mitrea,et al.  Methods and approaches in the topology-based analysis of biological pathways , 2013, Front. Physiol..

[13]  Robyn L Prueitt,et al.  Tumor immunobiological differences in prostate cancer between African-American and European-American men. , 2008, Cancer research.

[14]  Mauro Delorenzi,et al.  Analysis of potential transcriptomic biomarkers for Huntington's disease in peripheral blood , 2007, Proceedings of the National Academy of Sciences.

[15]  Ralf Zimmer,et al.  Rigorous assessment of gene set enrichment tests , 2012, Bioinform..

[16]  Krishna R. Kalari,et al.  FKBP51 affects cancer cell response to chemotherapy by negatively regulating Akt. , 2009, Cancer cell.

[17]  Soheil Meshinchi,et al.  Identification of genes with abnormal expression changes in acute myeloid leukemia , 2008, Genes, chromosomes & cancer.

[18]  C. Croce,et al.  The role of microRNA genes in papillary thyroid carcinoma. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  W. V. van IJcken,et al.  Gene Expression-Based Classification of Non-Small Cell Lung Carcinomas and Survival Prediction , 2010, PloS one.

[20]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[21]  Jose Rojas,et al.  Human endometriosis is associated with plasma cells and overexpression of B lymphocyte stimulator , 2007, Proceedings of the National Academy of Sciences.

[22]  F. Marshall Previously unidentified changes in renal cell carcinoma gene expression identified by parametric analysis of microarray data. , 2005, The Journal of urology.

[23]  Frank Kramer Integration of pathway data as prior knowledge into methods for network reconstruction , 2014 .

[24]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[25]  F. Middleton,et al.  Transcriptional analysis of multiple brain regions in Parkinson's disease supports the involvement of specific protein processing, energy metabolism, and signaling pathways, and suggests novel disease mechanisms , 2005, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[26]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[27]  Jaques Reifman,et al.  PathNet: a tool for pathway analysis using topological information , 2012, Source Code for Biology and Medicine.

[28]  Jin Wang,et al.  Centrality-based pathway enrichment: a systematic approach for finding significant pathways dominated by key genes , 2012, BMC Systems Biology.

[29]  Hong Wu,et al.  Integrative Survival-Based Molecular Profiling of Human Pancreatic Cancer , 2012, Clinical Cancer Research.

[30]  J. Gribben,et al.  Peripheral blood T cells in acute myeloid leukemia (AML) patients at diagnosis have abnormal phenotype and genotype and form defective immune synapses with AML blasts. , 2009, Blood.

[31]  T. G. Marr,et al.  Gene Expression Differences between Enriched Normal and Chronic Myelogenous Leukemia Quiescent Stem/Progenitor Cells and Correlations with Biological Abnormalities , 2011, Journal of oncology.

[32]  Andrew B. Nobel,et al.  Significance analysis of functional categories in gene expression studies: a structured permutation approach , 2005, Bioinform..

[33]  Alfonso Valencia,et al.  TopoGSA: network topological gene set analysis , 2010, Bioinform..

[34]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  N. M. van den Broek,et al.  Physical activity is the key determinant of skeletal muscle mitochondrial function in type 2 diabetes. , 2012, The Journal of clinical endocrinology and metabolism.

[36]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[37]  Juan Nunez-Iglesias,et al.  Joint Genome-Wide Profiling of miRNA and mRNA Expression in Alzheimer's Disease Cortex Reveals Altered miRNA Regulation , 2010, PloS one.

[38]  Rafael Rosell,et al.  Gene expression profiling reveals novel biomarkers in nonsmall cell lung cancer , 2011, International journal of cancer.

[39]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[40]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[41]  Michal A. Kurowski,et al.  Transcriptome Profile of Human Colorectal Adenomas , 2007, Molecular Cancer Research.

[42]  Marina Evangelou,et al.  Comparison of Methods for Competitive Tests of Pathway Analysis , 2012, PloS one.

[43]  P. Khatri,et al.  A systems biology approach for pathway level analysis. , 2007, Genome research.

[44]  Thomas Downey,et al.  A ‘metastasis-prone’ signature for early-stage mismatch-repair proficient sporadic colorectal cancer patients and its implications for possible therapeutics , 2010, Clinical & Experimental Metastasis.

[45]  K. Ho,et al.  A Susceptibility Gene Set for Early Onset Colorectal Cancer That Integrates Diverse Signaling Pathways: Implication for Tumorigenesis , 2007, Clinical Cancer Research.

[46]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[47]  Sorin Draghici,et al.  Down-weighting overlapping genes improves gene set analysis , 2012, BMC Bioinformatics.

[48]  Annarita D'Addabbo,et al.  Comparative study of gene set enrichment methods , 2009, BMC Bioinformatics.

[49]  T. Beißbarth,et al.  Interpreting experimental results using gene ontologies. , 2006, Methods in enzymology.

[50]  Lincoln Stein,et al.  Reactome: a database of reactions, pathways and biological processes , 2010, Nucleic Acids Res..

[51]  Roberto Romero,et al.  A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity , 2013, PloS one.

[52]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[53]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[54]  Monica Chiogna,et al.  Gene set analysis exploiting the topology of a pathway , 2010, BMC Systems Biology.

[55]  Z. Jehan,et al.  Genome-wide expression analysis of Middle Eastern colorectal cancer reveals FOXM1 as a novel target for cancer therapy. , 2011, The American journal of pathology.

[56]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[57]  B. Oliver,et al.  Microarrays, deep sequencing and the true measure of the transcriptome , 2011, BMC Biology.

[58]  Henryk Maciejewski,et al.  Gene set analysis methods: statistical models and methodological differences , 2013, Briefings Bioinform..

[59]  Ulrich Mansmann,et al.  Identification of a common gene expression signature in dilated cardiomyopathy across independent microarray studies. , 2006, Journal of the American College of Cardiology.

[60]  Zhongyu Liu,et al.  Gene Expression Profiling in Human High-Grade Astrocytomas , 2011, Comparative and functional genomics.

[61]  Z. Szallasi,et al.  Evaluation of Microarray Preprocessing Algorithms Based on Concordance with RT-PCR in Clinical Samples , 2009, PloS one.

[62]  Tim Beißbarth,et al.  R-Based Software for the Integration of Pathway Data into Bioinformatic Algorithms , 2014, Biology.

[63]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..