论文信息 - Persian part of speech tagger based on Hidden Markov Model

Persian part of speech tagger based on Hidden Markov Model

This paper introduces the Persian Part of Speech (POS) tagger, based on the Hidden Markov Models (HMM). This POS tagger is part of the Persian Text-to-Speech (TTS) system called ParsGooyan. The tagger supports some properties of TTS systems, such as Break Phrase Detection, Homograph words Disambiguation, and Lexical Stress Search. A POS lexicon with 61,521 entries and 64,003 trigrams is used as the language model. It is implemented in Festival software and makes use of the Viterbi Decoder provided by Edinburgh Speech Tools. The average overall accuracy for this tagger is 95.11%. The accuracy of the known and unknown words is 96.136% and 60.25%, respectively.

[1] Thorsten Brants,et al. TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[2] Steven J. DeRose,et al. Grammatical Category Disambiguation by Statistical Optimization , 1988, CL.

[3] Lluís Padró,et al. Developing Competitive HMM PoS Taggers Using Small Training Corpora , 2004, EsTAL.

[4] Paul Taylor,et al. Assigning phrase breaks from part-of-speech sequences , 1997, Comput. Speech Lang..

[5] S. Mostafa Assi,et al. Grammatical Tagging of a Persian Corpus , 2000 .

[6] Eric Brill,et al. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[7] Mark Hepple,et al. Independence and Commitment: Assumptions for Rapid Training and Execution of Rule-based POS Taggers , 2000, ACL.

[8] Fahimeh Raja,et al. Evaluation of statistical part of speech tagging of persian text , 2007, 2007 9th International Symposium on Signal Processing and Its Applications.

[9] James H. Martin,et al. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[10] Paul Taylor,et al. Festival Speech Synthesis System , 1998 .

[11] Nikolaos Mitianoudis,et al. International Symposium on Signal Processing and its Applications , 2003 .

[12] Farhad Oroumchian,et al. Creating a Feasible Corpus for Persian POS Tagging , 2007 .