SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation

SyMGiza++ -- a tool that computes symmetric word alignment models with the capability to take advantage of multi-processor systems -- is presented. A series of fairly simple modifications to the original IBM/Giza++ word alignment models allows to update the symmetrized models between chosen iterations of the original training algorithms. We achieve a relative alignment quality improvement of more than 17% compared to Giza++ and MGiza++ on the standard Canadian Hansards task, while maintaining the speed improvements provided by the capability of parallel computations of MGiza++. Furthermore, the alignment models are evaluated in the context of phrase-based statistical machine translation, where a consistent improvement measured in BLEU scores can be observed when SyMGiza++ is used instead of Giza++ or MGiza++.