Named Entity Recognition Over Electronic Health Records Through a Combined Dictionary-based Approach

Abstract In health care information systems, electronic health records are an important part of the knowledge concerning individual health histories. Extracting valuable knowledge from these records represents a challenging task because they are composed of data of different kind: images, test results, narrative texts that include both highly codified and a variety of notes which are diverse in language and detail, as well as ad hoc terminology, including acronyms and jargon, far from being highly codified. This paper proposes a combined approach for the recognition of named entities in such narrative texts. This approach is a composition of three different methods. The possible combinations are evaluated and the resulting composition shows an improvement of the recall and a limited impact on precision for the named entity recognition process.

[1]  Özlem Uzuner,et al.  Annotating risk factors for heart disease in clinical narratives for diabetic patients , 2015, J. Biomed. Informatics.

[2]  Dietrich Rebholz-Schuhmann,et al.  Assessment of disease named entity recognition on a corpus of annotated sentences , 2008, BMC Bioinformatics.

[3]  Hua Xu,et al.  A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries , 2011, J. Am. Medical Informatics Assoc..

[4]  Maria Kvist,et al.  Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study , 2014, J. Biomed. Informatics.

[5]  Jun'ichi Tsujii,et al.  Named entity recognition of follow-up and time information in 20 000 radiology reports , 2012, J. Am. Medical Informatics Assoc..

[6]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[7]  Goran Nenadic,et al.  Text mining of cancer-related information: Review of current status and future directions , 2014, Int. J. Medical Informatics.

[8]  Hua Xu,et al.  Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2 , 2015, J. Biomed. Informatics.

[9]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[10]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[11]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[12]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[13]  Zhiyong Lu,et al.  Challenges in clinical natural language processing for automated disorder normalization , 2015, J. Biomed. Informatics.

[14]  Noémie Elhadad,et al.  Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts , 2013, J. Biomed. Informatics.

[15]  Russ B. Altman,et al.  Research Paper: Creating an Online Dictionary of Abbreviations from MEDLINE , 2002, J. Am. Medical Informatics Assoc..

[16]  William R. Hersh,et al.  A Survey of Current Work in Biomedical Text Mining , 2005 .