Persian part of speech tagger based on Hidden Markov Model

This paper introduces the Persian Part of Speech (POS) tagger, based on the Hidden Markov Models (HMM). This POS tagger is part of the Persian Text-to-Speech (TTS) system called ParsGooyan. The tagger supports some properties of TTS systems, such as Break Phrase Detection, Homograph words Disambiguation, and Lexical Stress Search. A POS lexicon with 61,521 entries and 64,003 trigrams is used as the language model. It is implemented in Festival software and makes use of the Viterbi Decoder provided by Edinburgh Speech Tools. The average overall accuracy for this tagger is 95.11%. The accuracy of the known and unknown words is 96.136% and 60.25%, respectively.