A corpus-driven standardization framework for encoding clinical problems with HL7 FHIR

Free-text problem descriptions are brief explanations of patient diagnoses and issues, commonly found in problem lists and other prominent areas of the medical record. These compact representations often express complex and nuanced medical conditions, making their semantics challenging to fully capture and standardize. In this study, we describe a framework for transforming free-text problem descriptions into standardized Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) models. This approach leverages a combination of domain-specific dependency parsers, Bidirectional Encoder Representations from Transformers (BERT) natural language models, and cui2vec Unified Medical Language System (UMLS) concept vectors to align extracted concepts from free-text problem descriptions into structured FHIR models. A neural network classification model is used to classify thirteen relationship types between concepts, facilitating mapping to the FHIR Condition resource. We use data programming, a weak supervision approach, to eliminate the need for a manually annotated training corpus. Shapley values, a mechanism to quantify contribution, are used to interpret the impact of model features. We found that our methods identified the focus concept, or primary clinical concern of the problem description, with an F1 score of 0.95. Relationships from the focus to other modifying concepts were extracted with an F1 score of 0.90. When classifying relationships, our model achieved a 0.89 weighted average F1 score, enabling accurate mapping of attributes into HL7 FHIR models. We also found that the BERT input representation predominantly contributed to the classifier decision as shown by the Shapley values analysis.

[1]  Hongfang Liu,et al.  Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries , 2019, J. Biomed. Informatics.

[2]  Frederick Reiss,et al.  Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! , 2013, EMNLP.

[3]  A R Feinstein,et al.  The problems of the "problem-oriented medical record". , 1973, Annals of internal medicine.

[4]  Haibin Liu,et al.  Extracting drug-drug interactions from literature using a rich feature-based linear kernel approach , 2015, AMIA.

[5]  A. L. Rector Clinical terminology : Why is it so hard? : Challenges to Progresses , 1999 .

[6]  Hua Xu,et al.  Data from clinical notes: a perspective on the tension between structure and flexible documentation , 2011, J. Am. Medical Informatics Assoc..

[7]  Andreas Buerki,et al.  Head to Head: Semantic Similarity of Multi–Word Terms , 2018, IEEE Access.

[8]  Crystal Kallem,et al.  Problem list guidance in the EHR. , 2011, Journal of AHIMA.

[9]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[10]  Klaus Krippendorff,et al.  Answering the Call for a Standard Reliability Measure for Coding Data , 2007 .

[11]  Hongfang Liu,et al.  Automating the Transformation of Free-Text Clinical Problems into SNOMED CT Expressions. , 2020, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[12]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[13]  L. Weed Medical records that guide and teach. , 1968, The New England journal of medicine.

[14]  C. Fillmore FRAME SEMANTICS AND THE NATURE OF LANGUAGE * , 1976 .

[15]  L. Shapley,et al.  The Shapley Value , 1994 .

[16]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[17]  Charles N Mead,et al.  Data interchange standards in healthcare IT--computable semantic interoperability: now possible but still difficult, do we really need a better mousetrap? , 2006, Journal of healthcare information management : JHIM.

[18]  M. Wang,et al.  An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature , 2014, PloS one.

[19]  Olivier Bodenreider,et al.  Aggregating UMLS Semantic Types for Reducing Conceptual Complexity , 2001, MedInfo.

[20]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[21]  Eda Bilici,et al.  Structuring Clinical Decision Support Rules for Drug Safety Using Natural Language Processing , 2018, ICIMTH.

[22]  Hongfang Liu,et al.  Journal of Biomedical Informatics , 2022 .

[23]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[24]  Brian G. Arndt,et al.  Tethered to the EHR: Primary Care Physician Workload Assessment Using EHR Event Log Data and Time-Motion Observations , 2017, The Annals of Family Medicine.

[25]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[26]  J. Shull Digital Health and the State of Interoperable Electronic Health Records , 2019, JMIR medical informatics.

[27]  Jan A. Hazelzet,et al.  Determinants of a successful problem list to support the implementation of the problem-oriented medical record according to recent literature , 2016, BMC Medical Informatics and Decision Making.

[28]  George Hripcsak,et al.  Natural language processing in an operational clinical information system , 1995, Natural Language Engineering.

[29]  Zhiheng Li,et al.  Integrating shortest dependency path and sentence sequence into a deep learning framework for relation extraction in clinical text , 2019, BMC Medical Informatics and Decision Making.

[30]  Steven H. Brown,et al.  Evaluation of the content coverage of SNOMED CT: ability of SNOMED clinical terms to represent clinical problem lists. , 2006, Mayo Clinic proceedings.

[31]  Philip J. Kroth,et al.  Association of Electronic Health Record Design and Use Factors With Clinician Stress and Burnout , 2019, JAMA network open.

[32]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[33]  Yan Z. Heras,et al.  Clinical Element Model , 2008 .

[34]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[35]  Chen Wang,et al.  Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data , 2019, JAMIA open.

[36]  Sijia Liu,et al.  Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation , 2019, npj Digital Medicine.

[37]  Dipak Kalra,et al.  The openEHR Foundation. , 2005, Studies in health technology and informatics.

[38]  L. Shapley A Value for n-person Games , 1988 .

[39]  Ke Wang,et al.  Mining Disease-Symptom Relation from Massive Biomedical Literature and Its Application in Severe Disease Diagnosis , 2018, AMIA.

[40]  Alexa T. McCray,et al.  An Upper-Level Ontology for the Biomedical Domain , 2003, Comparative and functional genomics.

[41]  Erik Strumbelj,et al.  Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.

[42]  Andre Esteva,et al.  A guide to deep learning in healthcare , 2019, Nature Medicine.

[43]  A. Rector Clinical Terminology: Why Is it so Hard? , 1999, Methods of Information in Medicine.