Using Language and Translation Models to Select the Best among Outputs from Multiple MT Systems

This paper addresses the problem of automatically selecting the best among outputs from multiple machine translation (MT) systems. Existing approaches select the output assigned the highest score according to a target language model. In some cases, the existing approaches do not work well. This paper proposes two methods to improve performance. The first method is based on a multiple comparison test and checks whether a score from language and translation models is significantly higher than the others. The second method is based on probability that a translation is not inferior to the others, which is predicted from the above scores. Experimental results show that the proposed methods achieve an improvement of 2 to 6% in performance.