Enhancement of Medical Named Entity Recognition Using Graph-Based Features

Named Entity Recognition (NER) is a crucial step in text mining. This paper proposes a new graph-based technique for representing unstructured medical text. The new representation is used to extract discriminative features that are able to enhance the NER performance. To evaluate the usefulness of the proposed graph-based technique, the i2b2 medication challenge data set is used. Specifically, the 'treatment' named entities are extracted for evaluation using six different classifiers. The F-measure results of five classifiers are enhanced, with an average improvement of up to 26% in performance.

[1]  Chee Peng Lim,et al.  Enhancing medical named entity recognition with an extended segment representation technique , 2015, Comput. Methods Programs Biomed..

[2]  Michal Konkol,et al.  Latent semantics in Named Entity Recognition , 2015, Expert Syst. Appl..

[3]  U. Hahn,et al.  Reducing class imbalance during active learning for named entity annotation , 2009, K-CAP '09.

[4]  Jing Jiang,et al.  Information Extraction from Text , 2012, Mining Text Data.

[5]  Levent Özgür,et al.  Text Categorization with Class-Based and Corpus-Based Keyword Selection , 2005, ISCIS.

[6]  Hongfang Liu,et al.  Pacific Symposium on Biocomputing 9:238-249(2004) BIOLOGICAL NOMENCLATURES: A SOURCE OF LEXICAL KNOWLEDGE AND AMBIGUITY , 2022 .

[7]  Yuan Yuan,et al.  Using Bagging classifier to predict protein domain structural class. , 2006, Journal of biomolecular structure & dynamics.

[8]  Hongzhi Wang,et al.  Graph-based reference table construction to facilitate entity matching , 2013, Journal of Systems and Software.

[9]  Hongfei Lin,et al.  Drug name recognition in biomedical texts: a machine-learning-based method. , 2014, Drug discovery today.

[10]  Jari Björne,et al.  UTurku: Drug Named Entity Recognition and Drug-Drug Interaction Extraction Using SVM Classification and Domain Knowledge , 2013, *SEMEVAL.

[11]  Aaron M. Cohen,et al.  k-Information Gain Scaled Nearest Neighbors: A Novel Approach to Classifying Protein-Protein Interaction-Related Documents , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Mohammed Rais,et al.  A comparative study of biomedical named entity recognition methods based machine learning approach , 2014, 2014 Third IEEE International Colloquium in Information Science and Technology (CIST).

[13]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[14]  Satoshi Sekine,et al.  Extended Named Entity Hierarchy , 2002, LREC.

[15]  Maria Kvist,et al.  Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study , 2014, J. Biomed. Informatics.

[16]  Michael Krauthammer,et al.  Term identification in the biomedical literature , 2004, J. Biomed. Informatics.

[17]  Shih-Hung Wu,et al.  Integrating linguistic knowledge into a conditional random fieldframework to identify biomedical named entities , 2006, Expert systems with applications.

[18]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[19]  T. Takagi,et al.  Toward information extraction: identifying protein names from biological papers. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[20]  Asif Ekbal,et al.  Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition , 2013, Data Knowl. Eng..

[21]  Goran Nenadic,et al.  Challenges in Clinical Named Entity Recognition for Decision Support , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[22]  Paolo Rosso,et al.  Towards a Protein-Protein Interaction information extraction system: Recognizing named entities , 2014, Knowl. Based Syst..

[23]  Jun'ichi Tsujii,et al.  Improving the performance of dictionary-based approaches in protein name recognition , 2004, J. Biomed. Informatics.

[24]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[25]  Nick Cercone,et al.  Biological Named Entity Recognition Using n-grams and Classification Methods , 2005 .

[26]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[27]  Hae-Chang Rim,et al.  Biomedical named entity recognition using two-phase model based on SVMs , 2004, J. Biomed. Informatics.

[28]  Robert Eriksson,et al.  Dictionary construction and identification of possible adverse drug events in Danish clinical narrative text , 2013, J. Am. Medical Informatics Assoc..

[29]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.