论文信息 - Disentangling ASR and MT Errors in Speech Translation

Disentangling ASR and MT Errors in Speech Translation

The main aim of this paper is to investigate automatic quality assessment for spoken language translation (SLT). More precisely, we investigate SLT errors that can be due to transcription (ASR) or to translation (MT) modules. This paper investigates automatic detection of SLT errors using a single classifier based on joint ASR and MT features. We evaluate both 2-class (good/bad) and 3-class (good/badASR/badMT ) labeling tasks. The 3-class problem necessitates to disentangle ASR and MT errors in the speech translation output and we propose two label extraction methods for this non trivial step. This enables - as a by-product - qualitative analysis on the SLT errors and their origin (are they due to transcription or to translation step?) on our large in-house corpus for French-to-English speech translation.

Benjamin Lecouteux | Laurent Besacier | Ngoc-Tien Le

[1] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[2] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[3] Matthew G. Snover,et al. TERp System Description , 2008 .

[4] Benjamin Lecouteux,et al. Spoken language translation graphs re-decoding using automatic quality assessment , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[5] François Yvon,et al. Practical Very Large Scale CRFs , 2010, ACL.

[6] Guillaume Gravier,et al. Corpus description of the ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News , 2004, LREC.

[7] Hervé Blanchon,et al. The LIG Machine Translation System for WMT 2010 , 2010, WMT@ACL.

[8] Sebastian Stüker,et al. Overview of the IWSLT 2012 evaluation campaign , 2012, IWSLT.

[9] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10] Benjamin Lecouteux,et al. Word confidence estimation for speech translation , 2014, IWSLT.

[11] Benjamin Lecouteux,et al. An Open Source Toolkit for Word-level Confidence Estimation in Machine Translation , 2015 .