HITSZ _ CNER : A hybrid system for entity recognition from Chinese clinical text

With rapid development of electronic medical records, more and more attention has been attracted to reuse these data for research and commercial. As the entity recognition is one of the most primary task for medical information extraction, the 2017 China conference on knowledge graph and semantic computing (CCKS) challenge sets up a track for clinical named entity recognition (CNER). The organizers provide 400 annotated Chinese medical records for this track, 300 out of them are used as a training set and 100 as a test set. Other 2,605 raw medical records are released as an unlabeled set. In this study, we develop a hybrid system based on rule, CRF (conditional random fields) and RNN (recurrent neural network) methods for the CNER task. Experiments on the official test set show that our system achieves the F1-scores of 91.08% and 94.26% under the “strict” and “relaxed” criteria respectively, ranking first in the 2017 CCKS CNER challenge. By applying a self-training method with unlabeled data, the F1-scores of all machine learning-based methods are improved by about 1.0% under “strict” criterion. The future work of us will focus on the more effective extraction of body, disease and treatment entities.