DIDO: a disease-determinants ontology from web sources

This paper introduces DIDO, a system providing convenient access to knowledge about factors involved in human diseases, automatically extracted from textual Web sources. The knowledge base is bootstrapped by integrating entities from hand-crafted sources like MeSH and OMIM. As these are short on relationships between dierent types of biomedical entities, DIDO employs flexible and robust pattern learning and constraint-based reasoning methods to automatically extract new relational facts from textual sources. These facts can then be iteratively added to the knowledge base. The result is a semantic graph of typed entities and relations between diseases, their symptoms, and their factors, with emphasis on environmental factors but covering also molecular determinants. We demonstrate the value of DIDO for knowledge discovery about causal factors and properties of complex diseases, including factor-disease chains.