Target-language-driven agglomerative part-of-speech tag clustering for machine translation

This paper presents a method for reducing the set of dierent tags to be considered by a partof-speech tagger. The method is based on a clustering algorithm performed over the states of a hidden Markov model, which is initially trained by considering information not only from the source language, but also from the target language, using a new unsupervised technique which has been recently proposed to obtain taggers involved in machine translation systems. Then, a bottom-up agglomerative clustering algorithm groups the states of the hidden Markov model according to a similarity measure based on their transition probabilities; this reduces the complexity by grouping the initial finer tags into coarser ones. The experiments show that part-of-speech taggers using the coarser tags have smaller error rates than those using the initial finest tags; moreover, considering unsupervised information from the target language results in better clusters compared to those unsupervisedly built from source language information only.

[1]  Stephen M. Omohundro,et al.  Best-First Model Merging for Dynamic Learning and Recognition , 1991, NIPS.

[2]  J. Bernardo,et al.  Bayesian Hypothesis Testing: a Reference Approach , 2002 .

[3]  Thorsten Brants Estimating Markov model structures , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Kepa Sarasola,et al.  An open-source shallow-transfer machine translation engine for the Romance languages of Spain , 2005, EAMT.

[5]  Andreas Stolcke,et al.  Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[6]  Thorsten Brants Tagset Reduction without Information Loss , 1995, ACL.

[7]  O. Morgenthaler,et al.  Proceedings of the Conference , 1930 .

[8]  Mikel L. Forcada,et al.  Cooperative unsupervised training of the part-of-speech taggers in a bidirectional machine translation system , 2004 .

[9]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[10]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[11]  Ananth Sankar,et al.  HMM state clustering across allophone class boundaries , 1997, EUROSPEECH.

[12]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  Kenneth Ward Church,et al.  Poor Estimates of Context are Worse than None , 1990, HLT.

[15]  Mikel L. Forcada,et al.  Exploring the Use of Target-Language Information to Train the Part-of-Speech Tagger of Machine Translation Systems , 2004, EsTAL.