Part-of-Speech Tagging of Portuguese Using Hidden Markov Models with Character Language Model Emissions

This paper presents a probabilistic approach for POS tagging that combines HMMs and character language models being applied to Portuguese texts. In this approach, the emission probabilities for each hidden state in a HMM are estimated by a proper character language model. The tagger built has been trained and tested on Bosque, a subset of Floresta Sinta(c)tica treebank, reaching 96.2% accuracy with a 39-tag tagset and 92.0% with a 257-tag tagset extended with inflexion information.