Improved machine translation of speech-to-text outputs

Combining automatic speech recognition and machine translation is frequent in current research programs. This paper first presents several pre-processing steps to limit the performance degradation observed when translating an automatic transcription (as opposed to a manual transcription). Indeed, automatically transcribed speech often differs significantly from the machine translation system’s training material, with respect to caseing, punctuation and word normalization. The proposed system outperforms the best system at the 2007 TC-STAR evaluation by almost 2 points BLEU. The paper then attempts to determine a criteria characterizing how well an STT system can be translated, but the current experiments could only confirm that lower word error rates lead to better translations. Index Terms: ASR, MT, segmentation, punctuation, normalization, joint optimization

[1]  Olivier Galibert,et al.  The LIMSI 2006 TC-STAR EPPS Transcription Systems , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[3]  Salim Roukos,et al.  IBM spoken language translation system evaluation , 2004, IWSLT.

[4]  Hermann Ney,et al.  Evaluating Machine Translation Output with Automatic Sentence Segmentation , 2005, IWSLT.

[5]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[6]  Tanja Schultz,et al.  Rapid Development of an Afrikaans English Speech-to-Speech Translator , 2005, IWSLT.

[7]  Eiichiro Sumita,et al.  Creating corpora for speech-to-speech translation , 2003, INTERSPEECH.

[8]  Mark J. F. Gales,et al.  Speech Recognition System Combination for Machine Translation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  D. Dechelotte,et al.  Investigating translation of Parliament speeches , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[10]  Jean-Luc Gauvain,et al.  Neural network language models for conversational speech recognition , 2004, INTERSPEECH.

[11]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[12]  Kenji Kita,et al.  Spoken Language Translation System , 1993, IJCAI.

[13]  Hermann Ney,et al.  Automatic sentence segmentation and punctuation prediction for spoken language translation , 2006, IWSLT.

[14]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.