Hybrid Named Entity Recognition System for South and South East Asian Languages
暂无分享,去创建一个
This paper is submitted for the contest NERSSEAL-2008. Building a statistical based Named entity Recognition (NER) system requires huge data set. A rule based system needs linguistic analysis to formulate rules. Enriching the language specific rules can give better results than the statistical methods of named entity recognition. A Hybrid model proved to be better in identifying Named Entities (NE) in Indian Language where the task of identifying named entities is far more complicated compared to English because of variation in the lexical and grammatical features of Indian languages.
[1] Avinesh Pvs,et al. Part-Of-Speech Tagging and Chunking using Conditional Random Fields and Transformation Based Learning , 2006 .
[2] Thorsten Brants,et al. TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.
[3] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[4] Wei Li,et al. Rapid development of Hindi named entity recognition using conditional random fields and feature induction , 2003, TALIP.