Improving a full-text search engine: the importance of negation detection and family history context to identify cases in a biomedical data warehouse

Objective The repurposing of electronic health records (EHRs) can improve clinical and genetic research for rare diseases. However, significant information in rare disease EHRs is embedded in the narrative reports, which contain many negated clinical signs and family medical history. This paper presents a method to detect family history and negation in narrative reports and evaluates its impact on selecting populations from a clinical data warehouse (CDW). Materials and Methods We developed a pipeline to process 1.6 million reports from multiple sources. This pipeline is part of the load process of the Necker Hospital CDW. Results We identified patients with "Lupus and diarrhea," "Crohn's and diabetes," and "NPHP1" from the CDW. The overall precision, recall, specificity, and F-measure were 0.85, 0.98, 0.93, and 0.91, respectively. Conclusion The proposed method generates a highly accurate identification of cases from a CDW of rare disease EHRs.

[1]  Wendy W. Chapman,et al.  ConText: An Algorithm for Identifying Contextual Features from Clinical Text , 2007, BioNLP@ACL.

[2]  Wendy W. Chapman,et al.  Evaluating the Effectiveness of Four Contextual Features in Classifying Annotated Clinical Conditions in Emergency Department Reports , 2006, AMIA.

[3]  Wendy W. Chapman,et al.  Evaluation of negation phrases in narrative clinical reports , 2001, AMIA.

[4]  W. Bruce Croft,et al.  Research Paper: Ad Hoc Classification of Radiology Reports , 1999, J. Am. Medical Informatics Assoc..

[5]  Cyril Grouin,et al.  Detecting negation of medical problems in French clinical notes , 2012, IHI '12.

[6]  Caroline Gordon,et al.  Population‐Based Incidence and Prevalence of Systemic Lupus Erythematosus: The Michigan Lupus Epidemiology and Surveillance Program , 2014, Arthritis & rheumatology.

[7]  Joshua C. Denny,et al.  Chapter 13: Mining Electronic Health Records in the Genomics Era , 2012, PLoS Comput. Biol..

[8]  André Happe,et al.  Roogle: An Information Retrieval Engine for Clinical Data Warehouse , 2011, MIE.

[9]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[10]  S. Naser,et al.  Genetic Variations of PTPN2 and PTPN22: Role in the Pathogenesis of Type 1 Diabetes and Crohn's Disease , 2015, Front. Cell. Infect. Microbiol..

[11]  Prachi Anand,et al.  Lupus Enteritis as an Initial Presentation of Systemic Lupus Erythematosus , 2014, Case reports in gastrointestinal medicine.

[12]  Isaac S. Kohane,et al.  Integration of Clinical and Genetic Data in the i2b2 Architecture , 2006, AMIA.

[13]  Y Ioannou,et al.  A review of gastrointestinal manifestations of systemic lupus erythematosus. , 1999, Rheumatology.

[14]  Chen Lin,et al.  A system for coreference resolution for the clinical narrative , 2012, J. Am. Medical Informatics Assoc..

[15]  Hyeon-Eui Kim,et al.  Identification and Extraction of Family History Information from Clinical Reports , 2008, AMIA.

[16]  Simon Lin,et al.  Application of clinical text data for phenome-wide association studies (PheWASs) , 2015, Bioinform..

[17]  Lijun Wang,et al.  Cengage Learning at TREC 2011 Medical Track , 2011, TREC.

[18]  Paul A. Harris,et al.  Desiderata for computable representations of electronic health records-driven phenotype algorithms , 2015, J. Am. Medical Informatics Assoc..

[19]  Maria Kvist,et al.  Negation Scope Delimitation in Clinical Text Using Three Approaches: NegEx, PyConTextNLP and SynNeg , 2013, NODALIDA.

[20]  Brett R South,et al.  Adaptation of the NegEx algorithm to Veterans Affairs electronic text notes for detection of influenza-like illness (ILI). , 2007, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[21]  Clement J. McDonald,et al.  Using A Natural Language Processing System to Extract and Code Family History Data from Admission Reports , 2006, AMIA.

[22]  Mike Conway,et al.  Extending the NegEx Lexicon for Multiple Languages , 2013, MedInfo.

[23]  Maria Skeppstedt,et al.  Negation detection in Swedish clinical text: An adaption of NegEx to Swedish , 2011, J. Biomed. Semant..

[24]  Anita Burgun-Parenthoine,et al.  Reviewing 741 patients records in two hours with FASTVISU , 2015, AMIA.

[25]  Sahar Bayat,et al.  A full-text information retrieval system for an epidemiological registry , 2010, MedInfo.

[26]  Long H. Ngo,et al.  Implementation and Evaluation of Four Different Methods of Negation Detection , 2007 .

[27]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[28]  Ellen Riloff,et al.  The Taming of Reconcile as a Biomedical Coreference Resolver , 2011, BioNLP@ACL.

[29]  Neal Lewis,et al.  Extracting Family History Diagnosis from Clinical Texts , 2011, BICoB.