Considerations in maximum mutual information and minimum classification error training for statistical machine translation

Discriminative training methods are used in statistical machine translation to effectively introduce and combine additional knowledge sources within the translation process. Although these methods are described in the accompanying literature and comparative studies are available for speech recognition, additional considerations are introduced when applying discriminative training to statistical machine translation. In this paper we pay special attention to the comparison and formalization of discriminative training criteria and their respective optimization methods with the goal of improving translation performance measured by the corpus level BLEU metric for a Viterbi beam based decoder. We frame this work within the current trends in discriminative training and present reproducible results that highlight the potential as well as shortcomings of N-Best list based discriminative training.