论文信息 - Parts Of Speech Tagging for Indian Languages: A Literature Survey

Parts Of Speech Tagging for Indian Languages: A Literature Survey

Part of speech (POS) tagging is the process of assigning the part of speech tag or other lexical class marker to each and every word in a sentence. In many Natural Language Processing applications such as word sense disambiguation, information retrieval, information processing, parsing, question answering, and machine translation, POS tagging is considered as the one of the basic necessary tool. Identifying the ambiguities in language lexical items is the challenging objective in the process of developing an efficient and accurate POS Tagger. Literature survey shows that, for Indian languages, POS taggers were developed only in Hindi, Bengali, Panjabi and Dravidian languages. Some POS taggers were also developed generic to the Hindi, Bengali and Telugu languages. All proposed POS taggers were based on different Tagset, developed by different organization and individuals. This paper addresses the various developments in POS-taggers and POS-tagset for Indian language, which is very essential computational linguistic tool needed for many natural language processing (NLP) applications.

Selvadoss Thanamani Dr.Antony

[1] Gurpreet Singh Josan,et al. Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey , 2010 .

[2] Pushpak Bhattacharyya,et al. Hindi POS Tagger Using Naive Stemming : Harnessing Morphological Information Without Extensive Linguistic Knowledge , 2008 .

[3] Jabar H. Yousif,et al. Arabic part-of-speech tagger based Support Vectors Machines , 2008, 2008 International Symposium on Information Technology.

[4] K. P. Soman,et al. POS Tagger and Chunker for Tamil Language , 2009 .

[5] Amit Mishra,et al. Part of Speech Tagging for Hindi Corpus , 2011, 2011 International Conference on Communication Systems and Network Technologies.

[6] K. P. Soman,et al. Tamil POS Tagging using Linear Programming , 2009 .