Named entity recognition for tamil biomedical documents

Valuable Information about tamil traditional medicines are available in various forms like books, magazines and websites. These instructions are however very large and unstructured. Our system focuses on constructing a NER identification module using SVM classifier to identify named entities and to classify them into their corresponding categories. The two main categories considered are name of disorders and name of ingredients used. The system uses features such as unigrams/bigrams, case markers, substring clues and tf-idf score to classify the entities into their classes. These named entities are stored in the NE Dictionary based on their categories.

[1]  Carolyn M. Hall,et al.  Encyclopedia of Library and Information Science , 1971 .

[2]  Harold Borko,et al.  Encyclopedia of library and information science , 1970 .

[3]  Yu Song,et al.  POSBIOTM-NER : A Machine Learning Approach for Bio-Named Entity Recognition , 2004 .

[4]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[5]  Li Yang,et al.  Exploring feature sets for two-phase biomedical named entity recognition using semi-CRFs , 2013, Knowledge and Information Systems.

[6]  Yunsong Guo,et al.  Comparisons of sequence labeling algorithms and extensions , 2007, ICML '07.

[7]  Vaishnavi Ramaswamy,et al.  A morphological analyzer for Tamil , 2003 .

[8]  John Atkinson,et al.  A multi-strategy approach to biological named entity recognition , 2012, Expert Syst. Appl..

[9]  Wanxiang Che,et al.  Named Entity Recognition with Bilingual Constraints , 2013, HLT-NAACL.

[10]  Jing Jiang,et al.  Information Extraction from Text , 2012, Mining Text Data.

[11]  Paolo Rosso,et al.  Conditional Random Fields vs. Hidden Markov Models in a biomedical Named Entity Recognition task , 2007 .

[12]  Ben Taskar,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[13]  Noorul Islam,et al.  IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES , 2015 .

[14]  Yefeng Wang,et al.  Biomedical named entity recognition system , 2005 .

[15]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[16]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[17]  G. Mahalakshmi,et al.  Patti Vaithiyam — An Information Extraction System for Traditional Tamil Medicines , 2013 .

[18]  Deepti Chopra,et al.  N AMED ENTITY RECOGNITION IN ENGLISH USING H IDDEN M ARKOV M ODEL , 2013 .

[19]  Parag Kulkarni,et al.  Efficient Approach to find Bigram Frequency in Text Document using E-VSM , 2013 .

[20]  Hua Xu,et al.  Clinical entity recognition using structural support vector machines with rich features , 2012, DTMBIO '12.

[21]  Ralph Grishman,et al.  Information Extraction: Techniques and Challenges , 1997, SCIE.