Semantic Modeling for Exposomics with Exploratory Evaluation in Clinical Context

Exposome is a critical dimension in the precision medicine paradigm. Effective representation of exposomics knowledge is instrumental to melding nongenetic factors into data analytics for clinical research. There is still limited work in (1) modeling exposome entities and relations with proper integration to mainstream ontologies and (2) systematically studying their presence in clinical context. Through selected ontological relations, we developed a template-driven approach to identifying exposome concepts from the Unified Medical Language System (UMLS). The derived concepts were evaluated in terms of literature coverage and the ability to assist in annotating clinical text. The generated semantic model represents rich domain knowledge about exposure events (454 pairs of relations between exposure and outcome). Additionally, a list of 5667 disorder concepts with microbial etiology was created for inferred pathogen exposures. The model consistently covered about 90% of PubMed literature on exposure-induced iatrogenic diseases over 10 years (2001–2010). The model contributed to the efficiency of exposome annotation in clinical text by filtering out 78% of irrelevant machine annotations. Analysis into 50 annotated discharge summaries helped advance our understanding of the exposome information in clinical text. This pilot study demonstrated feasibility of semiautomatically developing a useful semantic resource for exposomics.

[1]  Karl T Kelsey,et al.  The fate is not always written in the genes: Epigenomics in epidemiologic studies , 2013, Environmental and molecular mutagenesis.

[2]  Serguei V. S. Pakhomov,et al.  Automated Extraction of Substance Use Information from Clinical Texts , 2015, AMIA.

[3]  Kathleen Gray,et al.  Exposome informatics: considerations for the design of future biomedical research information systems , 2014, J. Am. Medical Informatics Assoc..

[4]  Prakash M. Nadkarni,et al.  Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions , 2011, J. Am. Medical Informatics Assoc..

[5]  Molly A. Hall,et al.  Informatics and Data Analytics to Support Exposome-Based Discovery for Public Health. , 2017, Annual review of public health.

[6]  A. McCray The UMLS Semantic Network. , 1989 .

[7]  Peter Szolovits,et al.  Evaluating the state-of-the-art in automatic de-identification. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[8]  I. Kawachi,et al.  Transition to retirement and risk of cardiovascular disease: prospective analysis of the US health and retirement study. , 2012, Social science & medicine.

[9]  F. Martín-Sánchez,et al.  The New Role of Biomedical Informatics in the Age of Digital Medicine , 2016, Methods of Information in Medicine.

[10]  Genevieve B. Melton,et al.  Content and Quality of Free-Text Occupation Documentation in the Electronic Health Record , 2016, AMIA.

[11]  Alexander Turchin,et al.  Comparison of information content of structured and narrative text data sources on the example of medication intensification. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[12]  Melissa A. Basford,et al.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[13]  Xiaoyan Wang,et al.  Characterizing environmental and phenotypic associations using information theory and electronic health records , 2009, BMC Bioinformatics.

[14]  Betsy L. Humphreys,et al.  Technical Milestone: The Unified Medical Language System: An Informatics Research Collaboration , 1998, J. Am. Medical Informatics Assoc..

[15]  A. Alsheikh-Ali,et al.  Widowhood and severity of coronary artery disease: a multicenter study , 2017, Coronary artery disease.

[16]  D. Lindberg,et al.  Unified Medical Language System , 2020, Definitions.

[17]  Olivier Bodenreider,et al.  Aggregating UMLS Semantic Types for Reducing Conceptual Complexity , 2001, MedInfo.

[18]  C. Wild Complementing the Genome with an “Exposome”: The Outstanding Challenge of Environmental Exposure Measurement in Molecular Epidemiology , 2005, Cancer Epidemiology Biomarkers & Prevention.

[19]  Elizabeth S. Chen,et al.  A multi-site content analysis of social history information in clinical notes. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[20]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[21]  J. Yuan,et al.  Endothelial cell signaling and ventilator-induced lung injury: molecular mechanisms, genomic analyses, and therapeutic targets. , 2017, American journal of physiology. Lung cellular and molecular physiology.

[22]  J. Blake,et al.  Providing the Missing Link: the Exposure Science Ontology ExO , 2012, Environmental science & technology.