History of rare diseases and their genetic causes - a data driven approach

This dataset provides information about monogenic, rare diseases with a known genetic cause supplemented with manually extracted provenance of both the disease and the discovery of the underlying genetic cause of the disease. We collected 4166 rare monogenic diseases according to their OMIM identifier, linked them to 3163 causative genes which are annotated with Ensembl identifiers and HGNC symbols. The PubMed identifier of the scientific publication, which for the first time describes the rare disease, and the publication which found the gene causing this disease were added using information from OMIM, Wikipedia, Google Scholar, Whonamedit, and PubMed. The data is available as a spreadsheet and as RDF in a semantic model modified from DisGeNET. This dataset relies on publicly available data and publications with a PubMed IDs but this is to our knowledge the first time this data has been linked and made available for further study under a liberal license. Analysis of this data reveals the timeline of rare disease and causative genes discovery and links them to developments in methods and databases.

[1]  Jonathan Mélius,et al.  Providing gene-to-variant and variant-to-gene database identifier mappings to use with BridgeDb mapping services. , 2018 .

[2]  Egon L. Willighagen,et al.  Scholia, Scientometrics and Wikidata , 2017, ESWC.

[3]  P. Byers,et al.  Arginine for glycine substitution in the triple-helical domain of the products of one alpha 2(I) collagen allele (COL1A2) produces the osteogenesis imperfecta type IV phenotype. , 1988, The Journal of biological chemistry.

[4]  Egon L. Willighagen,et al.  Beyond Pathway Analysis: Identification of Active Subnetworks in Rett Syndrome , 2019, Front. Genet..

[5]  W. Nyhan,et al.  A FAMILIAL DISORDER OF URIC ACID METABOLISM AND CENTRAL NERVOUS SYSTEM FUNCTION. , 1964, The American journal of medicine.

[6]  J. Seegmiller,et al.  Enzyme Defect Associated with a Sex-Linked Human Neurological Disorder and Excessive Purine Synthesis , 1967, Science.

[7]  Benjamin M. Good,et al.  Wikidata as a semantic framework for the Gene Wiki initiative , 2015, bioRxiv.

[8]  Susan Tweedie,et al.  Genenames.org: the HGNC and VGNC resources in 2017 , 2016, Nucleic Acids Res..

[9]  Christian Gilissen,et al.  Disease gene identification strategies for exome sequencing , 2012, European Journal of Human Genetics.

[10]  Núria Queralt-Rosinach,et al.  DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants , 2016, Nucleic Acids Res..

[11]  Astrid Gall,et al.  Ensembl 2018 , 2017, Nucleic Acids Res..

[12]  R. Edwards,et al.  Transcriptome analysis of human brain tissue identifies reduced expression of complement complex C1Q Genes in Rett syndrome , 2016, BMC Genomics.

[13]  C. Evelo,et al.  From SNPs to pathways: Biological interpretation of type 2 diabetes (T2DM) genome wide association study (GWAS) results , 2018, PloS one.

[14]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[15]  D. Grange,et al.  Osteogenesis imperfecta type IV. Detection of a point mutation in one alpha 1(I) collagen allele (COL1A1) by RNA/RNA hybrid analysis. , 1989, The Journal of biological chemistry.

[16]  Friederike Ehrhart,et al.  CyTargetLinker app update: A flexible solution for network extension in Cytoscape , 2018, F1000Research.

[17]  Tobias Kuhn,et al.  nanopub-java: A Java Library for Nanopublications , 2015, LISC@ISWC.

[18]  Mark D. Wilkinson,et al.  MECP2 variation in Rett syndrome—An overview of current coverage of genetic and phenotype data within existing databases , 2018, Human mutation.