Challenges in the construction of knowledge bases for human microbiome-disease associations

The last few years have seen tremendous growth in human microbiome research, with a particular focus on the links to both mental and physical health and disease. Medical and experimental settings provide initial sources of information about these links, but individual studies produce disconnected pieces of knowledge bounded in context by the perspective of expert researchers reading full-text publications. Building a knowledge base (KB) consolidating these disconnected pieces is an essential first step to democratize and accelerate the process of accessing the collective discoveries of human disease connections to the human microbiome. In this article, we survey the existing tools and development efforts that have been produced to capture portions of the information needed to construct a KB of all known human microbiome-disease associations and highlight the need for additional innovations in natural language processing (NLP), text mining, taxonomic representations, and field-wide vocabulary standardization in human microbiome research. Addressing these challenges will enable the construction of KBs that help identify new insights amenable to experimental validation and potentially clinical decision support.

[1]  Thomas M. Keane,et al.  The European Nucleotide Archive in 2018 , 2018, Nucleic Acids Res..

[2]  Rob Knight,et al.  American Gut: an Open Platform for Citizen-Science Microbiome Research , 2018 .

[3]  Piotr Gawron,et al.  ReconMap: an interactive visualization of human metabolism , 2016, Bioinform..

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  William H. Majoros,et al.  Genomics and natural language processing , 2002, Nature Reviews Genetics.

[6]  U. Leser,et al.  Annotating and Evaluating Text for Stem Cell Research , 2012 .

[7]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[8]  Zhu-Hong You,et al.  A novel approach based on KATZ measure to predict associations of human microbiota with non‐infectious diseases , 2016, Bioinform..

[9]  Paul Turner,et al.  Reagent and laboratory contamination can critically impact sequence-based microbiome analyses , 2014, BMC Biology.

[10]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[11]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[12]  Rob Knight,et al.  American Gut: an Open Platform for Citizen Science Microbiome Research , 2018, mSystems.

[13]  C. Ouzounis,et al.  Expansion of the BioCyc collection of pathway/genome databases to 160 genomes , 2005, Nucleic acids research.

[14]  Fei Wang,et al.  A Neural Multi-Task Learning Framework to Jointly Model Medical Named Entity Recognition and Normalization , 2018, AAAI.

[15]  R. Beiko Microbial malaise: how can we classify the microbiome? , 2015, Trends in microbiology.

[16]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[17]  Roger Stout 316B Istanbul, Turkey , 2019, The Statesman’s Yearbook Companion.

[18]  R. Knight,et al.  Evolution of Mammals and Their Gut Microbes , 2008, Science.

[19]  Hye-Jeong Song,et al.  A method of inferring the relationship between Biomedical entities through correlation analysis on text , 2018, BioMedical Engineering OnLine.

[20]  S. Yooseph,et al.  Diet and feeding pattern affect the diurnal dynamics of the gut microbiome. , 2014, Cell metabolism.

[21]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[22]  Martin Hofmann-Apitius,et al.  An Empirical Evaluation of Resources for the Identification of Diseases and Adverse Effects in Biomedical Literature , 2010, LREC 2010.

[23]  James T. Morton,et al.  Microbiome-wide association studies link dynamic microbial consortia to disease , 2016, Nature.

[24]  Jingpu Zhang,et al.  A novel approach for predicting microbe-disease associations by bi-random walk on the heterogeneous network , 2017, PLoS ONE.

[25]  Falk Hildebrand,et al.  Enterotypes in the landscape of gut microbial community composition , 2017, Nature Microbiology.

[26]  L. Jensen,et al.  The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text , 2013, PloS one.

[27]  Karin M. Verspoor,et al.  Annotating the biomedical literature for the human variome , 2013, Database J. Biol. Databases Curation.

[28]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2019 , 2018, Nucleic Acids Res..

[29]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[30]  Mark Gerstein,et al.  Predicting interactions in protein networks by completing defective cliques , 2006, Bioinform..

[31]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[32]  Rob Knight,et al.  Regional variation limits applications of healthy gut microbiome reference ranges and disease models , 2018, Nature Medicine.

[33]  Thomas Hofmann,et al.  End-to-End Neural Entity Linking , 2018, CoNLL.

[34]  Rob Knight,et al.  Contamination in Low Microbial Biomass Microbiome Studies: Issues and Recommendations. , 2019, Trends in microbiology.

[35]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[36]  Louise Deléger,et al.  Overview of the Bacteria Biotope Task at BioNLP Shared Task 2016 , 2016, BioNLP.

[37]  Niranjan Nagarajan,et al.  @MInter: automated text-mining of microbial interactions , 2016, Bioinform..

[38]  Francesco Asnicar,et al.  QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science , 2018 .

[39]  Siu-Ming Yiu,et al.  BMCMDA: a novel model for predicting human microbe-disease associations via binary matrix completion , 2018, BMC Bioinformatics.

[40]  J. G. Burleigh,et al.  Synthesis of phylogeny and taxonomy into a comprehensive tree of life , 2014, Proceedings of the National Academy of Sciences.

[41]  R. Knight,et al.  Antibiotic-induced microbiome depletion alters metabolic homeostasis by affecting gut signaling and colonic metabolism , 2018, Nature Communications.

[42]  Rob Knight,et al.  Regulation of myocardial ketone body metabolism by the gut microbiota during nutrient deprivation , 2009, Proceedings of the National Academy of Sciences.

[43]  Dustin Wright,et al.  NormCo: Deep Disease Normalization for Biomedical Knowledge Base Construction , 2019, AKBC.

[44]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[45]  Zhiyong Lu,et al.  NCBI disease corpus: A resource for disease name recognition and concept normalization , 2014, J. Biomed. Informatics.

[46]  C. McDonald,et al.  LOINC, a universal standard for identifying laboratory observations: a 5-year update. , 2003, Clinical chemistry.

[47]  J. Clemente,et al.  Human gut microbiome viewed across age and geography , 2012, Nature.

[48]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[49]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[50]  A. Parte LPSN - List of Prokaryotic names with Standing in Nomenclature (bacterio.net), 20 years on. , 2018, International journal of systematic and evolutionary microbiology.

[51]  S. Turroni,et al.  Intestinal microbiota is a plastic factor responding to environmental changes. , 2012, Trends in microbiology.

[52]  Thomas C. Wiegers,et al.  MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database , 2012, Database J. Biol. Databases Curation.

[53]  I. Thiele,et al.  Systems biology of host–microbe metabolomics , 2015, Wiley interdisciplinary reviews. Systems biology and medicine.

[54]  Eric P. Nawrocki,et al.  An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea , 2011, The ISME Journal.

[55]  I-Min A. Chen,et al.  Genomes OnLine database (GOLD) v.7: updates and new features , 2018, Nucleic Acids Res..

[56]  Zhiyong Lu,et al.  BioCreative V CDR task corpus: a resource for chemical disease relation extraction , 2016, Database J. Biol. Databases Curation.

[57]  Rick L. Stevens,et al.  KBase: The United States Department of Energy Systems Biology Knowledgebase , 2018, Nature Biotechnology.

[58]  Katherine E Henson,et al.  Risk of Suicide After Cancer Diagnosis in England , 2018, JAMA psychiatry.

[59]  Miriam Baglioni,et al.  Protein complex prediction for large protein protein interaction networks with the Core&Peel method , 2016, BMC Bioinformatics.

[60]  Oliver Ebenhöh,et al.  A Diverse Community To Study Communities: Integration of Experiments and Mathematical Models To Study Microbial Consortia , 2017, Journal of bacteriology.

[61]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[62]  Thanh Hai Dang,et al.  D3NER: biomedical named entity recognition using CRF‐biLSTM improved with fine‐tuned embeddings of various linguistic information , 2018, Bioinform..

[63]  Zhiyong Lu,et al.  TaggerOne: joint named entity recognition and normalization with semi-Markov Models , 2016, Bioinform..

[64]  Rong Xu,et al.  Towards understanding brain-gut-microbiome connections in Alzheimer’s disease , 2016, BMC Systems Biology.

[65]  I-Min A. Chen,et al.  IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes , 2018, Nucleic Acids Res..

[66]  Piotr Gawron,et al.  The Virtual Metabolic Human database: integrating human and gut microbiome metabolism with nutrition and disease , 2018, bioRxiv.

[67]  S. Mazmanian,et al.  Gut biogeography of the bacterial microbiota , 2015, Nature Reviews Microbiology.

[68]  Yang Zhang,et al.  Editome Disease Knowledgebase (EDK): a curated knowledgebase of editome-disease associations in human , 2018, Nucleic Acids Res..

[69]  Juliane Fluck,et al.  Detecting miRNA Mentions and Relations in Biomedical Literature , 2014, F1000Research.

[70]  Wei Li,et al.  gcMeta: a Global Catalogue of Metagenomics platform to support the archiving, standardization and analysis of microbiome data , 2018, Nucleic Acids Res..

[71]  Zhu-Hong You,et al.  PBHMDA: Path-Based Human Microbe-Disease Association Prediction , 2017, Front. Microbiol..

[72]  G. Garrity Bergey’s Manual® of Systematic Bacteriology , 2012, Springer New York.

[73]  J. Metcalf,et al.  Replenishing our defensive microbes , 2013, BioEssays : news and reviews in molecular, cellular and developmental biology.

[74]  Pelin Yilmaz,et al.  The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks , 2013, Nucleic Acids Res..

[75]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[76]  Fei Li,et al.  A neural joint model for entity and relation extraction from biomedical text , 2017, BMC Bioinformatics.

[77]  Robert D. Finn,et al.  Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species , 2017, Nucleic Acids Res..

[78]  Burkhard Rost,et al.  Linked annotations: a middle ground for manual curation of biomedical databases and text corpora , 2015, BMC Proceedings.

[79]  Comparison of the human gastric microbiota in hypochlorhydric states arising as a result of Helicobacter pylori-induced atrophic gastritis, autoimmune atrophic gastritis and proton pump inhibitor use , 2017, PLoS pathogens.

[80]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[81]  Andrew McCallum,et al.  Attending to All Mention Pairs for Full Abstract Biological Relation Extraction , 2017, AKBC@NIPS.

[82]  Rick L. Stevens,et al.  A communal catalogue reveals Earth’s multiscale microbial diversity , 2017, Nature.

[83]  Kyu-Young Whang,et al.  MRPrimerW2: an enhanced tool for rapid design of valid high-quality primers with multiple search modes for qPCR experiments , 2019, Nucleic Acids Res..

[84]  Gerhard Weikum,et al.  Fast Entity Recognition in Biomedical Text , 2013, KDD 2013.

[85]  Maryam Habibi,et al.  Deep learning with word embeddings improves biomedical named entity recognition , 2017, Bioinform..

[86]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[87]  R. Knight,et al.  Role of the microbiome, probiotics, and ‘dysbiosis therapy’ in critical illness , 2016, Current opinion in critical care.

[88]  Scott Federhen,et al.  The NCBI Taxonomy database , 2011, Nucleic Acids Res..

[89]  V. N. Slee,et al.  The International Classification of Diseases: ninth revision (ICD-9) , 1978, Annals of internal medicine.

[90]  Helen E. Parkinson,et al.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..

[91]  Daisuke Kihara,et al.  Computational Methods for Predicting Protein‐Protein Interactions Using Various Protein Features , 2018, Current protocols in protein science.

[92]  Ying Cheng,et al.  The European Nucleotide Archive , 2010, Nucleic Acids Res..

[93]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[94]  W. Au,et al.  Involvement of gut microbiome in human health and disease: brief overview, knowledge gaps and research opportunities , 2018, Gut Pathogens.

[95]  Rob Knight,et al.  redbiom: a Rapid Sample Discovery and Feature Characterization System , 2019, mSystems.

[96]  Wei Ma,et al.  RxNorm: prescription for electronic drug information exchange , 2005, IT Professional.

[97]  Cathy H. Wu,et al.  DEXTER: Disease-Expression Relation Extraction from Text , 2018, Database J. Biol. Databases Curation.

[98]  Goran Nenadic,et al.  LINNAEUS: A species name identification system for biomedical literature , 2010, BMC Bioinformatics.

[99]  Heiko Paulheim,et al.  Knowledge graph refinement: A survey of approaches and evaluation methods , 2016, Semantic Web.

[100]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[101]  R. Knight,et al.  Diversity, stability and resilience of the human gut microbiota , 2012, Nature.

[102]  J. Nielandt,et al.  Disbiome database: linking the microbiome to disease , 2018, BMC Microbiology.

[103]  Mingxun Wang,et al.  Qiita: rapid, web-enabled microbiome meta-analysis , 2018, Nature Methods.

[104]  D. Wall,et al.  Activity of Species-specific Antibiotics Against Crohn's Disease–Associated Adherent-invasive Escherichia coli , 2015, Inflammatory bowel diseases.

[105]  Philip Miller,et al.  BiGG Models: A platform for integrating, standardizing and sharing genome-scale models , 2015, Nucleic Acids Res..

[106]  Rong Xu,et al.  MetabolitePredict: A de novo human metabolomics prediction system and its applications in rheumatoid arthritis , 2017, J. Biomed. Informatics.

[107]  The Uniprot Consortium UniProt: the universal protein knowledgebase , 2018, Nucleic acids research.

[108]  James T. Morton,et al.  Impacts of the Human Gut Microbiome on Therapeutics. , 2018, Annual review of pharmacology and toxicology.

[109]  Kevin Donnelly,et al.  SNOMED-CT: The advanced terminology and coding system for eHealth. , 2006, Studies in health technology and informatics.

[110]  Rui Gao,et al.  PRWHMDA: Human Microbe-Disease Association Prediction by Random Walk on the Heterogeneous Network with PSO , 2018, International journal of biological sciences.

[111]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[112]  Stuart J. Nelson,et al.  Normalized names for clinical drugs: RxNorm at 6 years , 2011, J. Am. Medical Informatics Assoc..

[113]  Xing Chen,et al.  A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases. , 2018, Bioinformatics.

[114]  D. Huson,et al.  SILVA, RDP, Greengenes, NCBI and OTT — how do these taxonomies compare? , 2017, BMC Genomics.

[115]  Li Li,et al.  A systems biology approach to predict and characterize human gut microbial metabolites in colorectal cancer , 2018, Scientific Reports.

[116]  Martin Hofmann-Apitius,et al.  Detecting miRNA Mentions and Relations in Biomedical Literature. , 2014, F1000Research.

[117]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..