SIDD: A Semantically Integrated Database towards a Global View of Human Disease

Background A number of databases have been developed to collect disease-related molecular, phenotypic and environmental features (DR-MPEs), such as genes, non-coding RNAs, genetic variations, drugs, phenotypes and environmental factors. However, each of current databases focused on only one or two DR-MPEs. There is an urgent demand to develop an integrated database, which can establish semantic associations among disease-related databases and link them to provide a global view of human disease at the biological level. This database, once developed, will facilitate researchers to query various DR-MPEs through disease, and investigate disease mechanisms from different types of data. Methodology To establish an integrated disease-associated database, disease vocabularies used in different databases are mapped to Disease Ontology (DO) through semantic match. 4,284 and 4,186 disease terms from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM) respectively are mapped to DO. Then, the relationships between DR-MPEs and diseases are extracted and merged from different source databases for reducing the data redundancy. Conclusions A semantically integrated disease-associated database (SIDD) is developed, which integrates 18 disease-associated databases, for researchers to browse multiple types of DR-MPEs in a view. A web interface allows easy navigation for querying information through browsing a disease ontology tree or searching a disease term. Furthermore, a network visualization tool using Cytoscape Web plugin has been implemented in SIDD. It enhances the SIDD usage when viewing the relationships between diseases and DR-MPEs. The current version of SIDD (Jul 2013) documents 4,465,131 entries relating to 139,365 DR-MPEs, and to 3,824 human diseases. The database can be freely accessed from: http://mlg.hit.edu.cn/SIDD.

[1]  Mark A. Musen,et al.  Building a biomedical ontology recommender web service , 2010, J. Biomed. Semant..

[2]  Florian Iragne,et al.  AliasServer: a web server to handle multiple aliases used to refer to proteins , 2004, Bioinform..

[3]  Muin J Khoury,et al.  Cancer GAMAdb: database of cancer genetic associations from meta-analyses and genome-wide association studies , 2011, European Journal of Human Genetics.

[4]  W. Kibbe,et al.  Annotating the human genome with Disease Ontology , 2009, BMC Genomics.

[5]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[6]  N. Campbell Genetic association database , 2004, Nature Reviews Genetics.

[7]  Thomas C. Wiegers,et al.  MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database , 2012, Database J. Biol. Databases Curation.

[8]  Carol A. Bocchini,et al.  A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®) , 2011, Human mutation.

[9]  P. Rennie,et al.  Androgen-regulated processing of the oncomir miR-27a, which targets Prohibitin in prostate cancer. , 2012, Human Molecular Genetics.

[10]  Russ B Altman,et al.  PharmGKB: a logical home for knowledge relating genotype to drug response phenotype , 2007, Nature Genetics.

[11]  Ying Wang,et al.  dbCRID: a database of chromosomal rearrangements in human diseases , 2010, Nucleic Acids Res..

[12]  Thomas C. Wiegers,et al.  Waiting for a Robust Disease Ontology: A Merger of OMIM and MeSH as a Practical Interim Solution , 2011, ICBO.

[13]  Antje Chang,et al.  BRENDA, the enzyme information system in 2011 , 2010, Nucleic Acids Res..

[14]  Wei Xu,et al.  The disease and gene annotations (DGA): an annotation resource for human disease , 2012, Nucleic Acids Res..

[15]  David Bryant,et al.  DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists , 2007, Nucleic Acids Res..

[16]  Tao Xu,et al.  Atlas – a data warehouse for integrative bioinformatics , 2005, BMC Bioinformatics.

[17]  Jie Zhang,et al.  SpliceDisease database: linking RNA splicing and disease , 2011, Nucleic Acids Res..

[18]  Gang Feng,et al.  Disease Ontology: a backbone for disease semantic integration , 2011, Nucleic Acids Res..

[19]  Thomas C. Wiegers,et al.  Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical–gene–disease networks , 2008, Nucleic Acids Res..

[20]  A. Rector,et al.  Relations in biomedical ontologies , 2005, Genome Biology.

[21]  Wei Xu,et al.  A Framework for Annotating Human Genome in Disease Context , 2012, PloS one.

[22]  Zhaohui Lu,et al.  miR-27a regulates the growth, colony formation and migration of pancreatic cancer cells by targeting Sprouty2. , 2010, Cancer letters.

[23]  Simon Lin,et al.  GeneRIF is a more comprehensive, current and computationally tractable source of gene-disease relationships than OMIM , 2006 .

[24]  Pak Chung Sham,et al.  GWASdb: a database for human genetic variants identified by genome-wide association studies , 2011, Nucleic Acids Res..

[25]  Zhiyong Lu,et al.  Linking multiple disease-related resources through UMLS , 2012, IHI '12.

[26]  Peter B. McGarvey,et al.  A comprehensive protein-centric ID mapping service for molecular data integration , 2011, Bioinform..

[27]  Patrick Ruch,et al.  Mapping proteins to disease terminologies: from UniProt to MeSH , 2008, BMC Bioinformatics.

[28]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[29]  Frederick P. Roth,et al.  The Synergizer service for translating gene, protein and other biological identifiers , 2008, Bioinform..

[30]  Yadong Wang,et al.  miR2Disease: a manually curated database for microRNA deregulation in human disease , 2008, Nucleic Acids Res..

[31]  H. Lowe,et al.  Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. , 1994, JAMA.

[32]  Michel Schneider,et al.  UniProtKB/Swiss-Prot. , 2007, Methods in molecular biology.

[33]  Xin Zhao,et al.  Dr.VIS: a database of human disease-related viral integration sites , 2011, Nucleic Acids Res..

[34]  Lars Juhl Jensen,et al.  DistiLD Database: diseases and traits in linkage disequilibrium blocks , 2011, Nucleic Acids Res..

[35]  David S. Wishart,et al.  HMDB: a knowledgebase for the human metabolome , 2008, Nucleic Acids Res..

[36]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[37]  Q. Cui,et al.  An Analysis of Human MicroRNA and Disease Associations , 2008, PloS one.

[38]  Yasunori Sato,et al.  Molecular analysis of a recurrent glioblastoma treated with bevacizumab , 2013, Brain Tumor Pathology.

[39]  Gary D. Bader,et al.  Cytoscape Web: an interactive web-based network browser , 2010, Bioinform..