Probabilistic part-of-speech tagging using decision trees

In this paper, a new probabilistic tagging method is presented which avoids problems that Markov Model based taggers face, when they have to estimate transition probabilities from sparse data. In this tagging method, transition probabilities are estimated using a decision tree. Based on this method, a part-of-speech tagger (called TreeTagger) has been implemented which achieves 96.36 % accuracy on Penn-Treebank data which is better than that of a trigram tagger (96.06 %) on the same data.