A supervised named entity recognition for information extraction from medical records

Named entity recognition is a widely used task to extract various kinds of information from unstructured text. Medical records, produced by hospitals every day contain huge amount of data about diseases, medications used in treatment and information about treatment success rate. There are a large number of systems used in information retrieval from medical documentation, but they are mostly used on documents written in English language. This paper contains the explanation of our approach to solving the problem of extracting disease and drug names from medical records written in Serbian language. Our approach uses statistical language models and can detect up to 80% of named entities, which is a good result given the very limited resources for Serbian language, which makes the process of detection much more difficult.

[1]  Hwee Tou Ng,et al.  Named Entity Recognition: A Maximum Entropy Approach Using Global Information , 2002, COLING.

[2]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[3]  Dan Klein,et al.  Named Entity Recognition with Character-Level Models , 2003, CoNLL.

[4]  C. Janikow A Knowledge-Intensive Genetic Algorithm for Supervised Learning , 2004, Machine Learning.

[5]  Alexander Gelbukh,et al.  Computational Linguistics and Intelligent Text Processing , 2017, Lecture Notes in Computer Science.

[6]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[7]  Eric Gaussier Statistical Language Models for Information Retrieval ChengXiang Zhai University of Illinois at Urbana Champaign Morgan & Claypool (Synthesis Lectures on Human Language Technologies, edited by Graeme Hirst), volume 1, 2008; xiii+125 pp, Princeton, NJ; paperbound, ISBN 978-1-59829-590-0, $40.00; eboo , 2010, Computational Linguistics.

[8]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[9]  Pierre Zweigenbaum,et al.  A Supervised Named-Entity Extraction System for Medical Text , 2013, CLEF.

[10]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[11]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[12]  Akiko Aizawa Linguistic Techniques to Improve the Performance of Automatic Text Categorization , 2001, NLPRS.

[13]  Giuseppe Attardi,et al.  Annotation and Extraction of Relations from Italian Medical Records , 2015, IIR.

[14]  Andrei Mikheev,et al.  Document centered approach to text normalization , 2000, SIGIR '00.

[15]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[16]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[17]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[18]  Shie-Jue Lee,et al.  A Similarity Measure for Text Classification and Clustering , 2014, IEEE Transactions on Knowledge and Data Engineering.

[19]  Sophia Ananiadou,et al.  Improving the Extraction of Clinical Concepts from Clinical Records , 2014 .

[20]  Yanjun Qi,et al.  Semi-supervised Bio-named Entity Recognition with Word-Codebook Learning , 2010, SDM.

[21]  V. Jawahar Senthil Kumar,et al.  Evaluating the Performance of Similarity Measures Used in Document Clustering and Information Retrieval , 2010, 2010 First International Conference on Integrated Intelligent Computing.

[22]  Vasudeva Varma,et al.  A Character n-gram Based Approach for Improved Recall in Indian Language NER , 2008, IJCNLP.

[23]  Jun'ichi Tsujii,et al.  Tuning support vector machines for biomedical named entity recognition , 2002, ACL Workshop on Natural Language Processing in the Biomedical Domain.

[24]  Noémie Elhadad,et al.  Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts , 2013, J. Biomed. Informatics.