Incorporating Knowledge Resources to Enhance Medical Information Extraction

This paper describes a method to extract medical information from texts. The method targets to extract complaints and diagnoses from electronic health record texts. Complaints and diagnoses are fundamental information and can be used for more complex medical tasks. The method utilizes several medical knowledge resources to enhance the performance of extraction. With an evaluation using NTCIR10 MedNLP data, our method marked 86.53 in F1 score with a cross validation. The score is comparable to top scoring teams in NTCIR-10 MedNLP task. The approach taken to incorporate knowledge resources has a high generality. It is not restricted to the resources presented in this paper and can be applied to various other resources.

[1]  Yuji Matsumoto,et al.  Japanese Named Entity Extraction with Redundant Morphological Analysis , 2003, NAACL.

[2]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[3]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[4]  K. J. Evans,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[5]  Dan Klein,et al.  Named Entity Recognition with Character-Level Models , 2003, CoNLL.

[6]  Kazuhiko Ohe,et al.  TEXT2TABLE: Medical Text Summarization System Based on Named Entity Recognition and Modality Identification , 2009, BioNLP@HLT-NAACL.

[7]  Siddhartha Jonnalagadda,et al.  Evaluating the Use of Empirically Constructed Lexical Resources for Named Entity Recognition , 2013 .

[8]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[9]  Mizuki Morita,et al.  NTCIR-10 MedNLP Task Baseline System , 2013, NTCIR.

[10]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[11]  Antonio Toral,et al.  A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia , 2006, Workshop On New Text Wikis And Blogs And Other Dynamic Text Sources.

[12]  Tomoko Ohkuma,et al.  Overview of the NTCIR-10 MedNLP Task , 2013, NTCIR.

[13]  William W. Cohen,et al.  Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods , 2004, KDD.

[14]  E. Brown,et al.  The Medical Dictionary for Regulatory Activities (MedDRA) , 1999, Drug safety.