An Experiment on Automatic Detection of Named Entities in Bangla

Several preprocessing steps are necessary in various problems of automatic Natural Language Processing. One major step is named-entity detection, which is relatively simple in English, because such entities start with an uppercase character. For Indian scripts like Bangla, no such indicator exists and the problem of identification is more complex, especially for human names, which may be common nouns and adjectives as well. In this paper we have proposed a three-stage approach of namedentity detection. The stages are based on the use of Named-Entity (NE) dictionary, rules for named-entity and left-right cooccurrence statistics. Experimental results obtained on Anandabazar Patrika (Most popular Bangla newspaper) corpus are quite encouraging.

[1]  Ralph Grishman,et al.  Unsupervised Learning of Generalized Names , 2002, COLING.

[2]  Paul Thompson,et al.  Name Searching and Information Retrieval , 1997, EMNLP.

[3]  David Yarowsky,et al.  Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence , 1999, EMNLP.

[4]  Norbert Fuhr,et al.  Retrieval Effectiveness of Proper Name Search Methods , 1996, Inf. Process. Manag..

[5]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[6]  Phil Hayes,et al.  NameFinder: Software that finds Names in Text , 1994, RIAO.

[7]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[8]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[9]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[10]  David D. Palmer,et al.  A Statistical Profile of the Named Entity Task , 1997, ANLP.

[11]  Nigel Collier,et al.  Use of Support Vector Machines in Extended Named Entity Recognition , 2002, CoNLL.

[12]  K. E. Ravikumar,et al.  A Biological Named Entity Recognizer , 2002, Pacific Symposium on Biocomputing.

[13]  Jun'ichi Tsujii,et al.  Improving the Scalability of Semi-Markov Conditional Random Fields for Named Entity Recognition , 2006, ACL.

[14]  Christine L. Borgman,et al.  Getty's Synoname and Its Cousins: A Survey of Applications of Personal Name-Matching Algorithms , 1992, J. Am. Soc. Inf. Sci..

[15]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[16]  Wei Li,et al.  Rapid development of Hindi named entity recognition using conditional random fields and feature induction , 2003, TALIP.

[17]  Kalina Bontcheva,et al.  Towards a semantic extraction of named entities , 2003 .

[18]  Kalina Bontcheva,et al.  MUSE: a MUlti-Source Entity recognition system , 2003 .

[19]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.