Automatically acquiring a language model for POS tagging using decision trees

We present an algorithm that automatically acquires a statistically{based language model for POS tagging, using statistical decision trees. The learning algorithm deals with more complex contextual information than simple collections of n{grams and it is able to use information of diierent nature. The acquired models are independent enough to be easily incorporated , as a statistical core of constraints/rules, in any exible tagger. They are also complete enough to be directly used as sets of POS disam-biguation rules. We have implemented a simple and fast tagger that has been tested and evaluated on the WSJ corpus with a remarkable accuracy. Comparative results are reported.