Corpus analysis of simultaneous interpretation data for improving real time speech translation

Real-time speech-to-speech (S2S) translation of lectures and speeches require simultaneous translation with low latency to continually engage the listeners. However, simultaneous speech-to-speech translation systems have been predominantly repurposing translation models that are typically trained for consecutive translation without a motivated attempt to model incrementality. Furthermore, the notion of translation is simplified to translation plus simultaneity. In contrast, human interpreters are able to perform simultaneous interpretation by generating target speech incrementally with very low ear-voice span by using a variety of strategies such as compression (paraphrasing), incremental comprehension, and anticipation through discourse inference and expectation of discourse redundancies. Exploiting and modeling such phenomena can potentially improve automatic real-time translation of speech. As a first step, in this work we identify and present a systematic analysis of phenomena used by human interpreters to perform simultaneous interpretation and elucidate how it can be exploited in a conventional simultaneous translation framework. We perform our study on a corpus of simultaneous interpretation of Parliamentary speeches in English and Spanish. Specifically, we present an empirical analysis of factors such as time constraint, redundancy and inference as evidenced in the simultaneous interpretation corpus.

[1]  Kathleen McKeown,et al.  Paraphrasing Using Given and New Information in a Question-Answer System , 1979, ACL.

[2]  M. Lederer,et al.  Simultaneous Interpretation — Units of Meaning and other Features , 1978 .

[3]  Alexander H. Waibel,et al.  End-to-End Evaluation in Simultaneous Translation , 2009, EACL.

[4]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[5]  Jan Niehues,et al.  The IWSLT 2011 Evaluation Campaign on Automatic Talk Translation , 2012, LREC.

[6]  Geoffrey Leech,et al.  EAGLES recommendations for the morphosyntactic annotation of corpora , 1996 .

[7]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[8]  Alexander H. Waibel,et al.  Simultaneous translation of lectures and speeches , 2007, Machine Translation.

[9]  Claudio Bendazzoli,et al.  Corpus-based Interpreting Studies : , 2009 .

[10]  Ghelly V. Chernov,et al.  Inference and Anticipation in Simultaneous Interpreting: A probability-prediction model , 2004 .

[11]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[12]  Sebastian Stüker,et al.  Overview of the IWSLT 2011 evaluation campaign , 2011, IWSLT.

[13]  Barbara Moser,et al.  Simultaneous Interpretation: A Hypothetical Model and its Practical Application , 1978 .

[14]  Srinivas Bangalore,et al.  A Scalable Approach to Building a Parallel Corpus from the Web , 2011, INTERSPEECH.

[15]  Dilek Z. Hakkani-Tür,et al.  The AT&T WATSON speech recognizer , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[16]  H. Ney,et al.  Statistical Machine Translation of European Parliamentary Speeches , 2005, MTSUMMIT.

[17]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.