A Biomedical Named Entity Recognition Using Machine Learning Classifiers and Rich Feature Set

As the wealth of biomedical knowledge in the form of literature increases, there is a rising need for effective natural language processing tools to assist in organizing, curating, and retrieving this information. The task of named entity recognition becomes more difficult from specific domain since entities are more exact to that particular domain. To that end, named entity recognition (the task of identifying words and phrases in free text that belong to certain classes of interest) is an important first step for many of these larger information management goals. In recent years, much attention has been focused on the problem of recognizing gene and protein and other biomedical entities mentions in biomedical abstracts. Thus, this study aims to design and develop a biomedical named entity recognition model. A machine learning classification framework is proposed based on Naïve Bayes, K-Nearest Neighbour and decision tree classifiers. we have performed several experiments to empirically compare different subsets of features and three classification approach Naïve Bayes, K-Nearest Neighbour and decision tree for biomedical named entity recognition. The aim is to efficiently integrate different feature sets and classification algorithms to synthesize a more accurate classification procedure. Results prove that the K-Nearest Neighbour trained with suitable features is more suitable to recognize named entities of biomedical texts than other models.

[1]  Md. Faisal Mahbub Chowdhury Improving the Effectiveness of Information Extraction from Biomedical Text , 2013 .

[2]  Dietrich Rebholz-Schuhmann,et al.  Assessment of disease named entity recognition on a corpus of annotated sentences , 2008, BMC Bioinformatics.

[3]  Asif Ekbal,et al.  Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition , 2013, Data Knowl. Eng..

[4]  M. Habib Biomedical Named Entity Recognition Using Support Vector Machines : Performance vs . Scalability Issues , 2008 .

[5]  Seth Kulick,et al.  Integrated Annotation for Biomedical Information Extraction , 2004, HLT-NAACL 2004.

[6]  Asif Ekbal,et al.  Named entity recognition and classification in biomedical text using classifier ensemble , 2015, Int. J. Data Min. Bioinform..

[7]  Tiejun Zhao,et al.  Biomedical Named Entity Recognition Based on Classifiers Ensemble , 2008, Int. J. Comput. Sci. Appl..

[8]  D. Scott Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics , 2004 .

[9]  Ulf Leser,et al.  ChemSpot: a hybrid system for chemical named entity recognition , 2012, Bioinform..

[10]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[11]  Wen-Lian Hsu,et al.  New Challenges for Biological Text-Mining in the Next Decade , 2010, Journal of Computer Science and Technology.

[12]  Barbara Rosario,et al.  Classifying Semantic Relations in Bioscience Texts , 2004, ACL.

[13]  R. Kulkarni,et al.  Gene Silencing of Phogrin Unveils Its Essential Role in Glucose-Responsive Pancreatic β-Cell Growth , 2009, Diabetes.

[14]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[15]  Hongfang Liu,et al.  A study of abbreviations in MEDLINE abstracts , 2002, AMIA.

[16]  Keun Ho Ryu,et al.  Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations , 2015, Journal of Cheminformatics.

[17]  Noémie Elhadad,et al.  Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts , 2013, J. Biomed. Informatics.