Ad Hoc Information Extraction for Clinical Data Warehouses

Summary Background: Clinical Data Warehouses (CDW) reuse Electronic health records (EHR) to make their data retrievable for research purposes or patient recruitment for clinical trials. However, much information are hidden in unstructured data like discharge letters. They can be preprocessed and converted to structured data via information extraction (IE), which is unfortunately a laborious task and therefore usually not available for most of the text data in CDW. Objectives: The goal of our work is to provide an ad hoc IE service that allows users to query text data ad hoc in a manner similar to querying structured data in a CDW. While search engines just return text snippets, our systems also returns frequencies (e.g. how many patients exist with “heart failure” including textual synonyms or how many patients have an LVEF < 45) based on the content of discharge letters or textual reports for special investigations like heart echo. Three subtasks are addressed: (1) To recognize and to exclude negations and their scopes, (2) to extract concepts, i.e. Boolean values and (3) to extract numerical values. Methods: We implemented an extended version of the NegEx-algorithm for German texts that detects negations and determines their scope. Furthermore, our document oriented CDW PaDaWaN was extended with query functions, e.g. context sensitive queries and regex queries, and an extraction mode for computing the frequencies for Boolean and numerical values. Results: Evaluations in chest X-ray reports and in discharge letters showed high F1-scores for the three subtasks: Detection of negated concepts in chest X-ray reports with an F1-score of 0.99 and in discharge letters with 0.97; of Boolean values in chest X-ray reports about 0.99, and of numerical values in chest X-ray reports and discharge letters also around 0.99 with the exception of the concept age. Discussion: The advantages of an ad hoc IE over a standard IE are the low development effort (just entering the concept with its variants), the promptness of the results and the adaptability by the user to his or her particular question. Disadvantage are usually lower accuracy and confidence. This ad hoc information extraction approach is novel and exceeds existing systems: Roogle [ 1 ] extracts predefined concepts from texts at preprocessing and makes them retrievable at runtime. Dr. Warehouse [ 2 ] applies negation detection and indexes the produced subtexts which include affirmed findings. Our approach combines negation detection and the extraction of concepts. But the extraction does not take place during preprocessing, but at runtime. That provides an ad hoc, dynamic, interactive and adjustable information extraction of random concepts and even their values on the fly at runtime. Conclusions: We developed an ad hoc information extraction query feature for Boolean and numerical values within a CDW with high recall and precision based on a pipeline that detects and removes negations and their scope in clinical texts.

[1]  Gottfried Vossen,et al.  The Single Source Architecture x4T to Connect Medical Documentation and Clinical Research , 2011, MIE.

[2]  Paul A. Harris,et al.  Secondary use of clinical data: The Vanderbilt approach , 2014, J. Biomed. Informatics.

[3]  Horacio Rodríguez,et al.  Syntactic methods for negation detection in radiology reports in Spanish , 2016, BioNLP@ACL.

[4]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[5]  Paul G. Biondich,et al.  The OpenMRS System: Collaborating Toward an Open Source EMR for Developing Countries , 2006, AMIA.

[6]  P. Harris,et al.  Research electronic data capture (REDCap) - A metadata-driven methodology and workflow process for providing translational research informatics support , 2009, J. Biomed. Informatics.

[7]  Christel Daniel-Le Bozec,et al.  EHR4CR: A Semantic Web Based Interoperability Approach for Reusing Electronic Healthcare Records in Protocol Feasibility Studies , 2012, SWAT4LS.

[8]  André Happe,et al.  Roogle: An Information Retrieval Engine for Clinical Data Warehouse , 2011, MIE.

[9]  Mike Conway,et al.  Extending the NegEx Lexicon for Multiple Languages , 2013, MedInfo.

[10]  Hongfang Liu,et al.  DEEPEN: A negation detection system for clinical text incorporating dependency relation into NegEx , 2015, J. Biomed. Informatics.

[11]  Markus Krug,et al.  Semi-Automatic Terminology Generation for Information Extraction from German Chest X-Ray Reports , 2017, GMDS.

[12]  Martijn J. Schuemie,et al.  ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus , 2014, BMC Bioinformatics.

[13]  Hans Uszkoreit,et al.  Negation Detection in Clinical Reports Written in German , 2016, BioTxtM@COLING 2016.

[14]  Jun Gao,et al.  DW4TR: A Data Warehouse for Translational Research , 2011, J. Biomed. Informatics.

[15]  Yi Zhang,et al.  Information Extraction from German Patient Records via Hybrid Parsing and Relation Extraction Strategies , 2014, LREC.

[16]  Wendy W. Chapman,et al.  Evaluation of negation phrases in narrative clinical reports , 2001, AMIA.

[17]  Frank Puppe,et al.  Fine-grained information extraction from German transthoracic echocardiography reports , 2015, BMC Medical Informatics and Decision Making.

[18]  Wendy W. Chapman,et al.  ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports , 2009, J. Biomed. Informatics.

[19]  W Premauer,et al.  ArchiMed: a medical information and retrieval system. , 1999, Methods of information in medicine.

[20]  Frank Puppe,et al.  Extending the Query Language of a Data Warehouse for Patient Recruitment , 2017, GMDS.

[21]  Susan C. Weber,et al.  STRIDE - An Integrated Standards-Based Translational Research Informatics Platform , 2009, AMIA.

[22]  Maria Skeppstedt,et al.  Negation detection in Swedish clinical text: An adaption of NegEx to Swedish , 2011, J. Biomed. Semant..

[23]  Jeffrey M. Miller,et al.  Harvest: an open platform for developing web-based biomedical data discovery and reporting applications , 2013, J. Am. Medical Informatics Assoc..

[24]  Ernestina Menasalvas Ruiz,et al.  An Approach to Detect Negation on Medical Documents in Spanish , 2014, Brain Informatics and Health.

[25]  Manfred Stede,et al.  Determining Negation Scope in German and English Medical Diagnoses , 2014 .

[26]  James J. Masanz,et al.  Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing , 2014, PloS one.

[27]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[28]  Yang Huang,et al.  A novel hybrid approach to automated negation detection in clinical radiology reports. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[29]  Ulf Leser,et al.  How to improve information extraction from German medical records , 2017, it Inf. Technol..

[30]  Peter L. Elkin,et al.  A controlled trial of automated classification of negation from clinical notes , 2005, BMC Medical Informatics Decis. Mak..

[31]  Sunghwan Sohn,et al.  Dependency Parser-based Negation Detection in Clinical Narratives , 2012, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[32]  Patrice Degoulet,et al.  Translational research platforms integrating clinical and omics data: a review of publicly available solutions , 2014, Briefings Bioinform..

[33]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[34]  Alberto Riva,et al.  BigQ: a NoSQL based framework to handle genomic variants in i2b2 , 2015, BMC Bioinformatics.

[35]  Cyril Grouin,et al.  Detecting negation of medical problems in French clinical notes , 2012, IHI '12.

[36]  Anita Burgun-Parenthoine,et al.  Improving a full-text search engine: the importance of negation detection and family history context to identify cases in a biomedical data warehouse , 2017, J. Am. Medical Informatics Assoc..