Named Entity Recognition in Chinese Electronic Medical Records Based on CRF

Massive Electronic Medical Records (EMRs) contain a lot of knowledge and Named Entity Recognition (NER) in Chinese EMR is a very important task. However, due to the lack of Chinese medical dictionary, there are few studies on NER in Chinese EMR. In this paper, we first build a medical dictionary. We then investigated the effects of different types of features in Chinese clinical NER tasks based on Condition Random Fields (CRF) algorithm, the most popular algorithm for NER, including bag-of-characters, part of speech, dictionary feature, and word clustering features. In the experimental section, we randomly selected 220 clinical texts from Peking Anzhen Hospital. The experimental results showed that these features were beneficial in varying degrees to Chinese named entity recognition. Finally, after analyzing the experimental results, we get some rules of thumb.

[1]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[2]  Massimo Piccardi,et al.  Bidirectional LSTM-CRF for Clinical Concept Extraction , 2016, ClinicalNLP@COLING 2016.

[3]  Yi Qian,et al.  Joint segmentation and named entity recognition using dual decomposition in Chinese discharge summaries. , 2014, Journal of the American Medical Informatics Association : JAMIA.

[4]  Chen Ying Intelligent Recognition of Named Entity in Electronic Medical Records , 2011 .

[5]  Anita Alicante,et al.  Unsupervised entity and relation extraction from clinical records in Italian , 2016, Comput. Biol. Medicine.

[6]  Hua Xu,et al.  A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries , 2011, J. Am. Medical Informatics Assoc..

[7]  Hua Xu,et al.  Research and applications: A comprehensive study of named entity recognition in Chinese clinical text , 2014, J. Am. Medical Informatics Assoc..

[8]  Joel D. Martin,et al.  Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 , 2011, J. Am. Medical Informatics Assoc..

[9]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[10]  Martijn J. Schuemie,et al.  Erasmus MC Approaches to the i2b2 Challenge , 2009 .

[11]  Chao Zhao,et al.  WI-ENRE in CLEF eHealth Evaluation Lab 2015: Clinical Named Entity Recognition Based on CRF , 2015, CLEF.

[12]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[13]  Ghalem Belalem,et al.  Using Local Grammar for Entity Extraction from Clinical Reports , 2015, Int. J. Interact. Multim. Artif. Intell..

[14]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[15]  Richard C Wasserman,et al.  Electronic medical records (EMRs), epidemiology, and epistemology: reflections on EMRs and future pediatric clinical research. , 2011, Academic pediatrics.

[16]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[17]  Devanshu Jain,et al.  Supervised Named Entity Recognition for Clinical Data , 2015, CLEF.