Cooperative unsupervised training of the part-of-speech taggers in a bidirectional machine translation system

When building a machine translation system, the embedded part-of-speech (PoS) tagger deserves special attention, since PoS ambiguities are one of the main sources of mistranslations, specially when related languages are involved. The standard statistical approach for PoS tagging are hidden Markov models (HMM) properly trained by collecting statistics from source-language texts. In the case of bidirectional machine translation systems, this kind of training is often individually performed on each PoS tagger without taking into account the other language, that is, the corresponding target language. But target-language information may help to improve performance. In this paper, a new method is proposed which trains both PoS taggers simultaneously using mutual interaction: at every iteration, the parameters of the HMM corresponding to one of the languages are refined by using the statistical data supplied by the current HMM for the other language. Both models bootstrap by learning cooperatively in an unsupervised manner and require only monolingual texts; no aligned texts are needed. Preliminary results are promising and surpass those of traditional unsupervised approaches.