Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap

Pathway enrichment analysis helps researchers gain mechanistic insight into gene lists generated from genome-scale (omics) experiments. This method identifies biological pathways that are enriched in a gene list more than would be expected by chance. We explain the procedures of pathway enrichment analysis and present a practical step-by-step guide to help interpret gene lists resulting from RNA-seq and genome-sequencing experiments. The protocol comprises three major steps: definition of a gene list from omics data, determination of statistically enriched pathways, and visualization and interpretation of the results. We describe how to use this protocol with published examples of differentially expressed genes and mutated cancer genes; however, the principles can be applied to diverse types of omics data. The protocol describes innovative visualization techniques, provides comprehensive background and troubleshooting guidelines, and uses freely available and frequently updated software, including g:Profiler, Gene Set Enrichment Analysis (GSEA), Cytoscape and EnrichmentMap. The complete protocol can be performed in ~4.5 h and is designed for use by biologists with no prior bioinformatics training.This protocol describes pathway enrichment analysis of gene lists from RNA-seq and other genomics experiments using g:Profiler, GSEA, Cytoscape and EnrichmentMap software.

[1]  Lincoln D. Stein,et al.  Impact of outdated gene annotations on pathway enrichment analysis , 2016, Nature Methods.

[2]  Colm O'Dushlaine,et al.  INRICH: interval-based enrichment analysis for genome-wide association studies , 2012, Bioinform..

[3]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[4]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[5]  Matthew D. Young,et al.  Gene ontology analysis for RNA-seq: accounting for selection bias , 2010, Genome Biology.

[6]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[7]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[8]  Jie Zhou,et al.  RNA-seq differential expression studies: more sequence or more replication? , 2014, Bioinform..

[9]  Jin Wang,et al.  CePa: an R package for finding significant pathways weighted by multiple network centralities , 2013, Bioinform..

[10]  Ben C. Collins,et al.  Quantitative proteomics: challenges and opportunities in basic and applied research , 2017, Nature Protocols.

[11]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[12]  Christophe Dessimoz,et al.  Quality of Computationally Inferred Gene Ontology Annotations , 2012, PLoS Comput. Biol..

[13]  Gary D. Bader,et al.  Metabolic Adaptation to Chronic Inhibition of Mitochondrial Protein Synthesis in Acute Myeloid Leukemia Cells , 2013, PloS one.

[14]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[15]  Kathleen Marchal,et al.  Pathway and network analysis of more than 2,500 whole cancer genomes , 2018, bioRxiv.

[16]  Hedi Peterson,et al.  g:Profiler—a web server for functional interpretation of gene lists (2016 update) , 2016, Nucleic Acids Res..

[17]  Tomé S Silva,et al.  Visualization and Differential Analysis of Protein Expression Data Using R. , 2016, Methods in molecular biology.

[18]  Stephen J. Guter,et al.  Convergence of Genes and Cellular Pathways Dysregulated in Autism Spectrum Disorders , 2014, American journal of human genetics.

[19]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[20]  Hedi Peterson,et al.  g:Profiler—a web-based toolset for functional profiling of gene lists from large-scale experiments , 2007, Nucleic Acids Res..

[21]  M. Roizen,et al.  Hallmarks of Cancer: The Next Generation , 2012 .

[22]  Gary D Bader,et al.  NetPath: a public resource of curated signal transduction pathways , 2010, Genome Biology.

[23]  D. G. MacArthur,et al.  Guidelines for investigating causality of sequence variants in human disease , 2014, Nature.

[24]  Gary D Bader,et al.  Dynamic interplay between locus-specific DNA methylation and hydroxymethylation regulates distinct biological pathways in prostate carcinogenesis , 2016, Clinical Epigenetics.

[25]  Pornpimol Charoentong,et al.  ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks , 2009, Bioinform..

[26]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Benjamin E. Gross,et al.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. , 2012, Cancer discovery.

[28]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[29]  Chris T. A. Evelo,et al.  WikiPathways: building research communities on biological pathways , 2011, Nucleic Acids Res..

[30]  Jing Wang,et al.  WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013 , 2013, Nucleic Acids Res..

[31]  Benjamin J. Raphael,et al.  Mutational landscape and significance across 12 major cancer types , 2013, Nature.

[32]  Thomas Lengauer,et al.  Comprehensive Analysis of DNA Methylation Data with RnBeads , 2014, Nature Methods.

[33]  Gary D Bader,et al.  Ectopic miR-125a Expression Induces Long-Term Repopulating Stem Cell Capacity in Mouse and Human Hematopoietic Progenitors. , 2016, Cell stem cell.

[34]  Cory Y. McLean,et al.  GREAT improves functional interpretation of cis-regulatory regions , 2010, Nature Biotechnology.

[35]  Nuno Nunes,et al.  PathVisio 3: An Extendable Pathway Analysis Toolbox , 2015, PLoS Comput. Biol..

[36]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[37]  Michael L. Creech,et al.  Integration of biological networks and gene expression data using Cytoscape , 2007, Nature Protocols.

[38]  P. Khatri,et al.  A systems biology approach for pathway level analysis. , 2007, Genome research.

[39]  Y. Benjamini,et al.  More powerful procedures for multiple significance testing. , 1990, Statistics in medicine.

[40]  Martin Kuiper,et al.  BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks , 2005, Bioinform..

[41]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[42]  P. Laird Principles and challenges of genome-wide DNA methylation analysis , 2010, Nature Reviews Genetics.

[43]  Gary D Bader,et al.  Integrated analysis of proteome, phosphotyrosine‐proteome, tyrosine‐kinome, and tyrosine‐phosphatome in acute myeloid leukemia , 2017, Proteomics.

[44]  Hui Yang,et al.  Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR , 2015, Nature Protocols.

[45]  Gary D. Bader,et al.  Pathguide: a Pathway Resource List , 2005, Nucleic Acids Res..

[46]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[47]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[48]  Gary D Bader,et al.  Functional impact of global rare copy number variation in autism spectrum disorders , 2010, Nature.

[49]  J. Mesirov,et al.  The Molecular Signatures Database (MSigDB) hallmark gene set collection. , 2015, Cell systems.

[50]  E. Lander Initial impact of the sequencing of the human genome , 2011, Nature.

[51]  G. Smyth,et al.  Camera: a competitive gene set test accounting for inter-gene correlation , 2012, Nucleic acids research.

[52]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[53]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[54]  Gary D Bader,et al.  Computational approaches to identify functional genetic variants in cancer genomes , 2013, Nature Methods.

[55]  K. Dolinski,et al.  Use and misuse of the gene ontology annotations , 2008, Nature Reviews Genetics.

[56]  Gary D Bader,et al.  Epigenomic alterations define lethal CIMP-positive ependymomas of infancy , 2014, Nature.

[57]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[58]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[59]  Peter D. Karp,et al.  The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases , 2007, Nucleic Acids Res..

[60]  Gary D Bader,et al.  Metabolomic profiling in liver of adiponectin-knockout mice uncovers lysophospholipid metabolism as an important target of adiponectin action. , 2015, The Biochemical journal.

[61]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[62]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[63]  V. Bansal,et al.  Statistical analysis strategies for association studies involving rare variants , 2010, Nature Reviews Genetics.

[64]  Sara Ballouz,et al.  Using predictive specificity to determine when gene set analysis is biologically meaningful , 2016, bioRxiv.

[65]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..

[66]  Michael P. Schroeder,et al.  IntOGen-mutations identifies cancer drivers across tumor types , 2013, Nature Methods.

[67]  Martin Krzywinski,et al.  Points of significance: Power and sample size , 2013, Nature Methods.

[68]  Gary D Bader,et al.  Attenuation of miR-126 Activity Expands HSC In Vivo without Exhaustion , 2012, Cell stem cell.

[69]  Israel Steinfeld,et al.  BMC Bioinformatics BioMed Central , 2008 .

[70]  K. Cibulskis,et al.  Prognostically relevant gene signatures of high-grade serous ovarian carcinoma. , 2012, The Journal of clinical investigation.

[71]  Gary D Bader,et al.  Enrichment Map: A Network-Based Method for Gene-Set Enrichment Visualization and Interpretation , 2010, PloS one.

[72]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[73]  Davis J. McCarthy,et al.  Count-based differential expression analysis of RNA sequencing data using R and Bioconductor , 2013, Nature Protocols.

[74]  Joshua M. Korn,et al.  Accurately Assessing the Risk of Schizophrenia Conferred by Rare Copy-Number Variation Affecting Genes with Brain Function , 2010, PLoS genetics.

[75]  Jun Ma,et al.  THINK Back: KNowledge-based Interpretation of High Throughput data , 2012, BMC Bioinformatics.

[76]  Andrew D. Rouillard,et al.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update , 2016, Nucleic Acids Res..

[77]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[78]  Gary D Bader,et al.  Pathway and network analysis of cancer genomes , 2015, Nature Methods.

[79]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[80]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[81]  H. Ji,et al.  A network-based gene-weighting approach for pathway analysis , 2011, Cell Research.

[82]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[83]  W. Huber,et al.  Differential expression analysis for sequence count data , 2010 .

[84]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[85]  Anushya Muruganujan,et al.  PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees , 2012, Nucleic Acids Res..

[86]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[87]  Alfonso Valencia,et al.  EnrichNet: network-based gene set enrichment analysis , 2012, Bioinform..

[88]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..