Mining microbe–disease interactions from literature via a transfer learning model

Background: Interactions of microbes and diseases are of great importance for biomedical research. However, large-scale curated databases for microbe-disease interactions are missing, as the amount of related literature is enormous and the curation process is costly and time-consuming. In this paper, we aim to construct a large-scale database for microbe-disease interactions automatically. We attained this goal via applying text mining methods based on a deep learning model with a moderate curation cost. We also built a user-friendly web interface to allow researchers navigate and query desired information. Results: For curation, we manually constructed a golden-standard corpora (GSC) and a sliver-standard corpora (SSC) for microbe-disease interactions. Then we proposed a text mining framework for microbe-disease interaction extraction without having to build a model from scratch. Firstly, we applied named entity recognition (NER) tools to detect microbe and disease mentions from texts. Then we transferred a deep learning model BERE to recognize relations between entities, which was originally built for drug-target interactions or drug-drug interactions. The introduction of SSC for model ne-tuning greatly improves the performance of detection for microbe-disease interactions, with an average reduction in error of approximately 10%. The resulting MDIDB website offers data browsing, custom search for specific diseases or microbes as well as batch download. Conclusions: Evaluation results demonstrate that our method outperform the baseline model (rule-based PKDE4J) with an average F1-score of 73.81%. For further validation, we randomly sampled nearly 1,000 predicted interactions by our model, and manually checked the correctness of each interaction, which gives a 73% accuracy. The MDIDB webiste is freely avaliable throuth http://dbmdi.com/index/