End-to-End Evaluation in JANUS: A Speech-to-speech Translation System

JANUS is a multi-lingual speech-to-speech translation system designed to facilitate communication between two parties engaged in a spontaneous conversation in a limited domain. In this paper we describe our methodology for evaluating translation performance. Our current focus is on end- to- end evaluations- the evaluation of the translation capabilities of the system as a whole. The main goal of our end-to-end evaluation procedure is to determine translation accuracy on a test set of previously unseen dialogues. Other goals include evaluating the effectiveness of the system in conveying domain-relevant information and in detecting and dealing appropriately with utterances (or portions of utterances) that are out-of-domain. End-to-end evaluations are performed in order to verify the general coverage of our knowledge sources, guide our development efforts, and to track our improvement over time. We discuss our evaluation procedures, the criteria used for assigning scores to translations produced by the system, and the tools developed for performing this task. Recent Spanish-to-English performance evaluation results are presented as an example.