Precise measurement method of a speech translation system’s capability with a paired comparison method between the system and humans

The main goal of the present paper is to propose new schemes for the overall evaluation of a speech translation system. These schemes are expected to support and improve the design of the target application system, and precisely determine its performance. Experiments are conducted on the Japanese-to-English speech translation system ATR-MATRIX, which was developed at ATR Interpreting Telecommunications Research Laboratories. In the proposed schemes, the system’s translations are compared with those of a native Japanese taking the Test of English for International Communication (TOEIC), which is used as a measure of one’s speech translation capability. Subjective and automatic comparisons are made and the results are compared. A regression analysis on the subjective results shows that the speech translation capability of ATR-MATRIX matches a Japanese person scoring around 500 on the TOEIC. The automatic comparisons also show promising results.