论文信息 - HMM Based Chunker for Hindi

HMM Based Chunker for Hindi

This paper presents an HMM-based chunk tagger for Hindi. Various tagging schemes for marking chunk boundaries are discussed along with their results. Contextual information is incorporated into the chunk tags in the form of partof-speech (POS) information. This information is also added to the tokens themselves to achieve better precision. Error analysis is carried out to reduce the number of common errors. It is found that for certain classes of words, using the POS information is more effective than using a combination of word and POS tag as the token. Finally, chunk labels are also marked on the chunks.

[1] Jian Su,et al. Hybrid Text Chunking , 2000, CoNLL/LLL.

[2] Thorsten Brants,et al. TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[3] Mitchell P. Marcus,et al. Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[4] Miles Osborne,et al. Shallow Parsing as Part-of-Speech Tagging , 2000, CoNLL/LLL.

[5] Yuji Matsumoto,et al. Chunking with Support Vector Machines , 2001, NAACL.

[6] Wojciech Skut,et al. Chunk Tagger - Statistical Recognition of Noun Phrases , 1998, ArXiv.

[7] Kenneth Ward Church. A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[8] Antal van den Bosch,et al. Single-Classifier Memory-Based Phrase Chunking , 2000, CoNLL/LLL.