Exploring semantic deep learning for building reliable and reusable one health knowledge from PubMed systematic reviews and veterinary clinical notes

Background Deep Learning opens up opportunities for routinely scanning large bodies of biomedical literature and clinical narratives to represent the meaning of biomedical and clinical terms. However, the validation and integration of this knowledge on a scale requires cross checking with ground truths (i.e. evidence-based resources) that are unavailable in an actionable or computable form. In this paper we explore how to turn information about diagnoses, prognoses, therapies and other clinical concepts into computable knowledge using free-text data about human and animal health. We used a Semantic Deep Learning approach that combines the Semantic Web technologies and Deep Learning to acquire and validate knowledge about 11 well-known medical conditions mined from two sets of unstructured free-text data: 300 K PubMed Systematic Review articles (the PMSB dataset) and 2.5 M veterinary clinical notes (the VetCN dataset). For each target condition we obtained 20 related clinical concepts using two deep learning methods applied separately on the two datasets, resulting in 880 term pairs (target term, candidate term). Each concept, represented by an n-gram, is mapped to UMLS using MetaMap; we also developed a bespoke method for mapping short forms (e.g. abbreviations and acronyms). Existing ontologies were used to formally represent associations. We also create ontological modules and illustrate how the extracted knowledge can be queried. The evaluation was performed using the content within BMJ Best Practice. Results MetaMap achieves an F measure of 88% (precision 85%, recall 91%) when applied directly to the total of 613 unique candidate terms for the 880 term pairs. When the processing of short forms is included, MetaMap achieves an F measure of 94% (precision 92%, recall 96%). Validation of the term pairs with BMJ Best Practice yields precision between 98 and 99%. Conclusions The Semantic Deep Learning approach can transform neural embeddings built from unstructured free-text data into reliable and reusable One Health knowledge using ontologies and content from BMJ Best Practice.

[1]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[2]  James J. Cimino,et al.  Towards the development of a conceptual distance metric for the UMLS , 2004, J. Biomed. Informatics.

[3]  Chris Dyer,et al.  Ontologically Grounded Multi-sense Representation Learning for Semantic Vector Space Models , 2015, NAACL.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Felix Hill,et al.  SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.

[6]  Quanshi Zhang,et al.  Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[7]  Peter Burkart,et al.  Pharmacology, Efficacy, and Tolerability of Potassium Bromide in Childhood Epilepsy , 2007, Journal of child neurology.

[8]  Y. Dang,et al.  Secondary glaucoma as initial manifestation of ring melanoma: a case report and review of literature. , 2014, International journal of clinical and experimental pathology.

[9]  Laura H. Kahn Perspective: The one-health way , 2017, Nature.

[10]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[11]  James Pustejovsky,et al.  Biomedical term mapping databases , 2004, Nucleic Acids Res..

[12]  Akira R. Kinjo,et al.  Neuro-symbolic representation learning on biological knowledge graphs , 2016, Bioinform..

[13]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[14]  Eytan Adar,et al.  SaRAD: a Simple and Robust Abbreviation Dictionary , 2004, Bioinform..

[15]  Goran Nenadic,et al.  A Case Study on Sepsis Using PubMed and Deep Learning for Ontology Learning. , 2017, Studies in health technology and informatics.

[16]  Ian Horrocks,et al.  Just the right amount: extracting modules from ontologies , 2007, WWW '07.

[17]  Carol Friedman,et al.  Methods for Building Sense Inventories of Abbreviations in Clinical Notes , 2008, AMIA.

[18]  Marilyn J. Field,et al.  COMMITTEE TO ADVISE THE PUBLIC HEALTH SERVICE ON CLINICAL PRACTICE GUIDELINES , 1990 .

[19]  Goran Nenadic,et al.  Deep Learning meets Semantic Web: A feasibility study with the Cardiovascular Disease Ontology and PubMed citations , 2016, ODLS.

[20]  Nicolette de Keizer,et al.  A survey of SNOMED CT implementations , 2012, Journal of Biomedical Informatics.

[21]  Wolfgang Löscher,et al.  The Pharmacology of Imepitoin: The First Partial Benzodiazepine Receptor Agonist Developed for the Treatment of Epilepsy , 2013, CNS Drugs.

[22]  Reed McEwan,et al.  Corpus domain effects on distributional semantic modeling of medical terms , 2016, Bioinform..

[23]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[24]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[25]  Carol Friedman,et al.  A Study of Abbreviations in Clinical Notes , 2007, AMIA.

[26]  Sean Bechhofer,et al.  The OWL API: A Java API for OWL ontologies , 2011, Semantic Web.

[27]  Tapio Salakoski,et al.  Distributional Semantics Resources for Biomedical Text Processing , 2013 .

[28]  Ted Pedersen,et al.  Towards a framework for developing semantic relatedness reference standards , 2011, J. Biomed. Informatics.

[29]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[30]  Goran Nenadic,et al.  Ontology Learning with Deep Learning: a Case Study on Patient Safety Using PubMed , 2016, SWAT4LS.

[31]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[32]  Nigel Collier,et al.  Improved Semantic Representation for Domain-Specific Entities , 2016, ACL 2016.

[33]  Gautier Koscielny,et al.  Linking rare and common disease: mapping clinical disease-phenotypes to ontologies in therapeutic target validation , 2016, J. Biomed. Semant..

[34]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[35]  A Lisa,et al.  One Health Initiative , 2011, AAP News.

[36]  Fiona Simpson,et al.  Candidemia in critically ill patients: difference of outcome between medical and surgical patients , 2003, Intensive Care Medicine.

[37]  Hua Xu,et al.  A comparative study of current clinical natural language processing systems on handling abbreviations in discharge summaries , 2012, AMIA.

[38]  Philipp Cimiano,et al.  Linking Lexical Resources and Ontologies on the Semantic Web with Lemon , 2011, ESWC.

[39]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[40]  D. Sackett,et al.  Evidence based medicine: what it is and what it isn't , 1996, BMJ.

[41]  Sunil Kumar Sahu,et al.  Evaluating distributed word representations for capturing semantics of biomedical concepts , 2015, BioNLP@IJCNLP.

[42]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[43]  Neil R. Smalheiser,et al.  ADAM: another database of abbreviations in MEDLINE , 2006, Bioinform..

[44]  Zhiyong Lu,et al.  Recommending MeSH terms for annotating biomedical articles , 2011, J. Am. Medical Informatics Assoc..

[45]  Robert Stevens,et al.  The Manchester OWL Syntax , 2006, OWLED.

[46]  Hongfang Liu,et al.  A study of abbreviations in the UMLS , 2001, AMIA.

[47]  Yulia Tsvetkov,et al.  Problems With Evaluation of Word Embeddings Using Word Similarity Tasks , 2016, RepEval@ACL.

[48]  Lucila Ohno-Machado,et al.  Natural language processing: an introduction , 2011, J. Am. Medical Informatics Assoc..

[49]  Matthias Samwald,et al.  Exploring the Application of Deep Learning Techniques on Medical Text Corpora , 2014, MIE.

[50]  Yasunori Yamamoto,et al.  Allie: a database and a search service of abbreviations and long forms , 2011, Database J. Biol. Databases Curation.

[51]  Wanda Pratt,et al.  A Study of Biomedical Concept Identification: MetaMap vs. People , 2003, AMIA.

[52]  D. Rebholz-Schuhmann,et al.  Text-mining solutions for biomedical research: enabling integrative biology , 2012, Nature Reviews Genetics.

[53]  Terrence Adam,et al.  Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[54]  Malcolm Higgs,et al.  A Framework for Action , 1988 .

[55]  Goran Nenadic,et al.  Deep learning meets ontologies: experiments to anchor the cardiovascular disease ontology in the biomedical literature , 2018, Journal of Biomedical Semantics.

[56]  Ted Pedersen,et al.  UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity , 2009, AMIA.

[57]  J. S. Hunter,et al.  Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building. , 1979 .

[58]  Ian Horrocks,et al.  FaCT++ Description Logic Reasoner: System Description , 2006, IJCAR.

[59]  W R Hersh,et al.  How well do physicians use electronic information retrieval systems? A framework for investigation and systematic review. , 1998, JAMA.

[60]  S. Opal,et al.  The Next Generation of Sepsis Clinical Trial Designs: What Is Next After the Demise of Recombinant Human Activated Protein C?* , 2014, Critical care medicine.

[61]  Danushka Bollegala,et al.  Jointly learning word embeddings using a corpus and a knowledge base , 2018, PloS one.

[62]  Serguei V. S. Pakhomov,et al.  A sense inventory for clinical abbreviations and acronyms created using clinical notes and medical dictionary resources , 2014, J. Am. Medical Informatics Assoc..

[63]  Paul Buitelaar,et al.  On the Role of Senses in the Ontology-Lexicon , 2013, New Trends of Research in Ontologies and Lexical Resources.

[64]  Nigel Collier,et al.  Improved Semantic Representation for Domain-Specific Entities , 2016, BioNLP@ACL.

[65]  Trevor Cohen,et al.  Empirical distributional semantics: Methods and biomedical applications , 2009, J. Biomed. Informatics.

[66]  Dongsheng Duan,et al.  A One Health overview, facilitating advances in comparative medicine and translational research , 2016, Clinical and Translational Medicine.