Medical Entity Recognition: A Comparaison of Semantic and Statistical Methods

Medical Entity Recognition is a crucial step towards efficient medical texts analysis. In this paper we present and compare three methods based on domain-knowledge and machine-learning techniques. We study two research directions through these approaches: (i) a first direction where noun phrases are extracted in a first step with a chunker before the final classification step and (ii) a second direction where machine learning techniques are used to identify simultaneously entities boundaries and categories. Each of the presented approaches is tested on a standard corpus of clinical texts. The obtained results show that the hybrid approach based on both machine learning and domain knowledge obtains the best performance.

[1]  Clement J. McDonald,et al.  Extracting Structured Information from Free Text Pathology Reports , 2003, AMIA.

[2]  Carol Friedman,et al.  Two biomedical sublanguages: a description based on the theories of Zellig Harris , 2002, J. Biomed. Informatics.

[3]  Proux,et al.  Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction. , 1998, Genome informatics. Workshop on Genome Informatics.

[4]  Xinglong Wang Rule-Based Protein Term Identification with Help from Automatic Species Tagging , 2007, CICLing.

[5]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[6]  Sivaji Bandyopadhyay,et al.  Named Entity Recognition using Support Vector Machine: A Language Independent Approach , 2010 .

[7]  Alexander A. Morgan,et al.  BioCreAtIvE Task 1A: gene mention finding evaluation , 2005, BMC Bioinformatics.

[8]  Pierre Zweigenbaum,et al.  Automatic extraction of semantic relations between medical entities: a rule based approach , 2011, J. Biomed. Semant..

[9]  Isabelle Tellier,et al.  Champs Markoviens Conditionnels pour l'extraction d'information , 2011 .

[10]  Natalia Grabar,et al.  Building a Text Corpus for Representing the Variety of Medical Language , 2001, MedInfo.

[11]  Ying He,et al.  Biological Entity Recognition with Conditional Random Fields , 2008, AMIA.

[12]  Tyne Liang,et al.  Empirical Textual Mining to Protein Entities Recognition from PubMed Corpus , 2005, NLDB.

[13]  L. Tick,et al.  Medical Language Processing: Applications to Patient Data Representation and Automatic Encoding , 1995, Methods of Information in Medicine.

[14]  Erik M. van Mulligen,et al.  Comparing and combining chunkers of biomedical text , 2011, J. Biomed. Informatics.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Peter J. Haug,et al.  Comparing Natural Language Processing Tools to Extract Medical Problems from Narrative Text , 2005, AMIA.

[17]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[18]  Mehdi Embarek,et al.  Learning Patterns for Building Resources about Semantic Relations in the Medical Domain , 2008, LREC.

[19]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[20]  Hideki Isozaki,et al.  Efficient Support Vector Classifiers for Named Entity Recognition , 2002, COLING.

[21]  Barbara Rosario,et al.  Classifying Semantic Relations in Bioscience Texts , 2004, ACL.