DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants

The information about the genetic basis of human diseases lies at the heart of precision medicine and drug discovery. However, to realize its full potential to support these goals, several problems, such as fragmentation, heterogeneity, availability and different conceptualization of the data must be overcome. To provide the community with a resource free of these hurdles, we have developed DisGeNET (http://www.disgenet.org), one of the largest available collections of genes and variants involved in human diseases. DisGeNET integrates data from expert curated repositories, GWAS catalogues, animal models and the scientific literature. DisGeNET data are homogeneously annotated with controlled vocabularies and community-driven ontologies. Additionally, several original metrics are provided to assist the prioritization of genotype–phenotype relationships. The information is accessible through a web interface, a Cytoscape App, an RDF SPARQL endpoint, scripts in several programming languages and an R package. DisGeNET is a versatile platform that can be used for different research purposes including the investigation of the molecular underpinnings of specific human diseases and their comorbidities, the analysis of the properties of disease genes, the generation of hypothesis on drug therapeutic action and drug adverse effects, the validation of computationally predicted disease genes and the evaluation of text-mining methods performance.

[1]  Laura Inés Furlong,et al.  CDH1/E-cadherin and solid tumors. An updated gene-disease association analysis using bioinformatics tools , 2016, Comput. Biol. Chem..

[2]  Weisong Liu,et al.  The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease , 2014, Nucleic Acids Res..

[3]  Nuno A. Fonseca,et al.  Expression Atlas update—an integrated database of gene and protein expression in humans, animals and plants , 2015, Nucleic Acids Res..

[4]  Monica Campillos,et al.  Organ system heterogeneity DB: a database for the visualization of phenotypes at the organ system level , 2014, Nucleic Acids Res..

[5]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[6]  A. Hopkins,et al.  The druggable genome , 2002, Nature Reviews Drug Discovery.

[7]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[8]  Núria Queralt-Rosinach,et al.  Publishing DisGeNET as Nanopublications , 2014, bioRxiv.

[9]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[10]  Volkhard Helms,et al.  TFmiR: a web server for constructing and analyzing disease-specific transcription factor and miRNA co-regulatory networks , 2015, Nucleic Acids Res..

[11]  Núria Queralt-Rosinach,et al.  DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes , 2015, Database J. Biol. Databases Curation.

[12]  Salvatore Alaimo,et al.  ncPred: ncRNA-Disease Association Prediction through Tripartite Network-Based Inference , 2014, Front. Bioeng. Biotechnol..

[13]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[14]  J. Roca,et al.  Network medicine analysis of COPD multimorbidities , 2014, Respiratory Research.

[15]  Xing Chen,et al.  A Computational Framework to Infer Human Disease-Associated Long Noncoding RNAs , 2014, PloS one.

[16]  Yasha Hasija,et al.  dbAARD & AGP: A computational pipeline for the prediction of genes associated with age related disorders , 2016, J. Biomed. Informatics.

[17]  Núria Queralt-Rosinach,et al.  DisGeNET-RDF: harnessing the innovative power of the Semantic Web to explore the genetic basis of diseases , 2015, bioRxiv.

[18]  Peter N. Robinson,et al.  The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease , 2015, American journal of human genetics.

[19]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database's 10th year anniversary: update 2015 , 2014, Nucleic Acids Res..

[20]  Peter Hokland,et al.  Novel scripts for improved annotation and selection of variants from whole exome sequencing in cancer research , 2015, MethodsX.

[21]  Peter Woollard,et al.  An Integrated Data Driven Approach to Drug Repositioning Using Gene-Disease Associations , 2016, PloS one.

[22]  Dan J Stein,et al.  Candidate gene networks and blood biomarkers of methamphetamine-associated psychosis: an integrative RNA-sequencing report , 2016, Translational psychiatry.

[23]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[24]  Anushya Muruganujan,et al.  PANTHER version 10: expanded protein families and functions, and analysis tools , 2015, Nucleic Acids Res..

[25]  F. Sanz,et al.  A Knowledge-Driven Approach to Extract Disease-Related Biomarkers from the Literature , 2014, BioMed research international.

[26]  Núria Queralt-Rosinach,et al.  Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research , 2014, BMC Bioinformatics.

[27]  Dong Xu,et al.  DTMiner: identification of potential disease targets through biomedical literature mining , 2016, Bioinform..

[28]  Monica Chagoyen,et al.  Characterization of clinical signs in the human interactome , 2016, Bioinform..

[29]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[30]  M. Sternberg,et al.  Exploring the cellular basis of human disease through a large-scale mapping of deleterious genes to cell types , 2015, Genome Medicine.

[31]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[32]  Gang Fu,et al.  Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data , 2014, Nucleic Acids Res..

[33]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[34]  S. Shorvon,et al.  Genetic mutations associated with status epilepticus , 2015, Epilepsy & Behavior.

[35]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[36]  Monica Campillos,et al.  Systematic analysis of gene properties influencing organ system phenotypes in mammalian perturbations , 2014, Bioinform..

[37]  F. Dhombres,et al.  Representation of rare diseases in health information systems: The orphanet approach to serve a wide range of end users , 2012, Human mutation.

[38]  Núria Queralt-Rosinach,et al.  The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery , 2014, J. Biomed. Semant..

[39]  Ferran Sanz,et al.  Molecular and clinical diseasome of comorbidities in exacerbated COPD patients , 2015, European Respiratory Journal.

[40]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[41]  Andrew M. Jenkinson,et al.  The EBI RDF platform: linked open data for the life sciences , 2014, Bioinform..

[42]  Ryan Miller,et al.  WikiPathways: capturing the full diversity of pathway knowledge , 2015, Nucleic Acids Res..

[43]  A. Gonzalez-Perez,et al.  Uncovering disease mechanisms through network biology in the era of Next Generation Sequencing , 2016, Scientific Reports.

[44]  In-Hee Lee,et al.  Prioritizing Disease‐Linked Variants, Genes, and Pathways with an Interactive Whole‐Genome Analysis Pipeline , 2014, Human mutation.

[45]  Scott Boyer,et al.  Correction: Automatic Filtering and Substantiation of Drug Safety Signals , 2012, PLoS Computational Biology.

[46]  Núria Queralt-Rosinach,et al.  Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research , 2014 .

[47]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[48]  N. Campbell Genetic association database , 2004, Nature Reviews Genetics.

[49]  François Schiettecatte,et al.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders , 2014, Nucleic Acids Res..

[50]  Mulin Jun Li,et al.  Nature Genetics Advance Online Publication a N a Ly S I S the Support of Human Genetic Evidence for Approved Drug Indications , 2022 .

[51]  Patrick J. Paddison,et al.  Causal Mechanistic Regulatory Network for Glioblastoma Deciphered Using Systems Genetics Network Analysis. , 2016, Cell systems.

[52]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease , 2014, Nucleic Acids Res..

[53]  Laura Inés Furlong,et al.  DisGeNET: a Cytoscape plugin to visualize, integrate, search and analyze gene-disease networks , 2010, Bioinform..

[54]  Hans-Peter Kriegel,et al.  Extraction of semantic biomedical relations from text using conditional random fields , 2008, BMC Bioinformatics.

[55]  Martin Petrek,et al.  Association Study for 26 Candidate Loci in Idiopathic Pulmonary Fibrosis Patients from Four European Populations , 2016, Front. Immunol..

[56]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.