pathDIP: an annotated resource for known and predicted human gene-pathway associations and pathway enrichment analysis

Molecular pathway data are essential in current computational and systems biology research. While there are many primary and integrated pathway databases, several challenges remain, including low proteome coverage (57%), low overlap across different databases, unavailability of direct information about underlying physical connectivity of pathway members, and high fraction of protein-coding genes without any pathway annotations, i.e. ‘pathway orphans’. In order to address all these challenges, we developed pathDIP, which integrates data from 20 source pathway databases, ‘core pathways’, with physical protein–protein interactions to predict biologically relevant protein–pathway associations, referred to as ‘extended pathways’. Cross-validation determined 71% recovery rate of our predictions. Data integration and predictions increase coverage of pathway annotations for protein-coding genes to 86%, and provide novel annotations for 5732 pathway orphans. PathDIP (http://ophid.utoronto.ca/pathdip) annotates 17 070 protein-coding genes with 4678 pathways, and provides multiple query, analysis and output options.

[1]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[2]  Zhenghe Wang,et al.  Genetic alterations of protein tyrosine phosphatases in human cancers , 2014, Oncogene.

[3]  Gary D. Bader,et al.  Pathway Commons, a web resource for biological pathway data , 2010, Nucleic Acids Res..

[4]  Gloria M. Sheynkman,et al.  Widespread Expansion of Protein Interaction Capabilities by Alternative Splicing , 2016, Cell.

[5]  Jeffrey T. Chang,et al.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies , 2006, Nature.

[6]  Anushya Muruganujan,et al.  PANTHER version 10: expanded protein families and functions, and analysis tools , 2015, Nucleic Acids Res..

[7]  Liang Tong,et al.  Targeting the Human Cancer Pathway Protein Interaction Network by Structural Genomics* , 2008, Molecular & Cellular Proteomics.

[8]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[9]  Deng Pan,et al.  DGIdb 2.0: mining clinically relevant drug–gene interactions , 2015, Nucleic Acids Res..

[10]  Chuan-Yun Li,et al.  KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases , 2011, Nucleic Acids Res..

[11]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[12]  Doron Lancet,et al.  PathCards: multi-source consolidation of human biological pathways , 2015, Database J. Biol. Databases Curation.

[13]  Igor Jurisica,et al.  Integrated interactions database: tissue-specific view of the human and model organism interactomes , 2015, Nucleic Acids Res..

[14]  R. Deberardinis,et al.  Metabolic pathways promoting cancer cell survival and growth , 2015, Nature Cell Biology.

[15]  David S. Wishart,et al.  SMPDB 2.0: Big Improvements to the Small Molecule Pathway Database , 2013, Nucleic Acids Res..

[16]  Igor Jurisica,et al.  Identification of microRNA-181a-5p and microRNA-4454 as mediators of facet cartilage degeneration. , 2016, JCI insight.

[17]  Igor Jurisica,et al.  In silico prediction of physical protein interactions and characterization of interactome orphans , 2014, Nature Methods.

[18]  Gary D Bader,et al.  NetPath: a public resource of curated signal transduction pathways , 2010, Genome Biology.

[19]  I. Jurisica,et al.  Fundamentals of protein interaction network mapping , 2015, Molecular systems biology.

[20]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[21]  M. Rask-Andersen,et al.  The druggable genome: Evaluation of drug targets in clinical trials suggests major shifts in molecular class and indication. , 2014, Annual review of pharmacology and toxicology.

[22]  Aedín C. Culhane,et al.  GeneSigDB: a manually curated database and resource for analysis of gene expression signatures , 2011, Nucleic Acids Res..

[23]  Ralf Herwig,et al.  Analyzing and interpreting genome data at the network level with ConsensusPathDB , 2016, Nature Protocols.

[24]  Benjamin J. Raphael,et al.  Pan-Cancer Network Analysis Identifies Combinations of Rare Somatic Mutations across Pathways and Protein Complexes , 2014, Nature Genetics.

[25]  Pradeep Kumar Sreenivasaiah,et al.  IPAVS: Integrated Pathway Resources, Analysis and Visualization System , 2012, Nucleic acids research.

[26]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[27]  I Jurisica,et al.  Identification of synovial fluid microRNA signature in knee osteoarthritis: differentiating early- and late-stage knee osteoarthritis. , 2016, Osteoarthritis and cartilage.

[28]  N. Campbell Genetic association database , 2004, Nature Reviews Genetics.

[29]  Alfonso Valencia,et al.  EnrichNet: network-based gene set enrichment analysis , 2012, Bioinform..

[30]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[31]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[32]  M. Kimmel,et al.  Conflict of interest statement. None declared. , 2010 .

[33]  Ryan Miller,et al.  WikiPathways: capturing the full diversity of pathway knowledge , 2015, Nucleic Acids Res..

[34]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..