Using WSD Techniques for Lexical Selection in Statistical Machine Translation

Abstract : In current state of the art statistical MT systems, word choice in the target language is governed implicitly by a combination of "phrase" selection and language modeling. In contrast, the state of the art in word sense disambiguation takes advantage of a wide array of features, both locally and at the document level. This technical report describes our initial efforts to employ the power of WSD techniques in helping to guide a state of the art statistical MT system toward better word choices. We briefly discuss the principles underlying our approach as contrasted with another recent attempt to integrate WSD with statistical MT (Carpuat and Wu, 2005) that yielded negative results. We then describe our approach, which leads to a small improvement in translation performance over a state of the art phrase-based statistical MT system. Qualitative analysis of translation output suggests there are still significant opportunities to improve performance further.