A study on the stability and effectiveness of features in quality estimation for spoken language translation

A quality estimation (QE) approach informed with machine translation (MT) and speech recognition (ASR) features has recently shown to improve the performance of a spoken language translation (SLT) system in an in-domain scenario. When domain mismatch is progressively introduced in the MT and ASR systems, the SLT system’s performance naturally degrades. The use of QE to improve SLT performance has not been studied in this context. In this paper we investigate the effectiveness of QE under this setting. Our experiments showed that across moderate levels of domain mismatches, QE led to consistent translation improvements of around 0.4 in BLEU score. The QE system relies on 116 features derived from the ASR and MT system input and output. Feature analysis was conducted to understand the information sources contributing the most to performance improvements. LDA dimension reduction was used to summarise effective features into sets as small as 3 without affecting the SLT performance. By inspecting the principal components, eight features including the acoustic model scores and count-based word statistics on the bilingual text were found to be critically important, leading to a further boost of around 0.1 BLEU score over the full set of features. These findings provide interesting possibilities for further work by incorporating the effective QE features in SLT system training or decoding.

[1]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[2]  Lucia Specia,et al.  A Bayesian non-linear method for feature selection in machine translation quality estimation , 2015, Machine Translation.

[3]  Lucia Specia,et al.  Quality estimation for asr k-best list rescoring in spoken language translation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[5]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[6]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[7]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[8]  Juyang Weng,et al.  Using Discriminant Eigenfeatures for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  David A. van Leeuwen,et al.  The 2007 AMI(DA) System for Meeting Transcription , 2007, CLEAR.

[10]  Lucia Specia,et al.  QuEst – Design, Implementation and Extensions of a Framework for Machine Translation Quality Estimation , 2013, Prague Bull. Math. Linguistics.

[11]  Lucia Specia,et al.  QuEst - A translation quality estimation framework , 2013, ACL.

[12]  Lucia Specia,et al.  An Investigation on the Effectiveness of Features for Translation Quality Estimation , 2013, MTSUMMIT.

[13]  Lucia Specia,et al.  The USFD Spoken Language Translation System for IWSLT 2014 , 2015, ArXiv.

[14]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[15]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[16]  Li Deng,et al.  Why word error rate is not a good metric for speech recognizer training for the speech translation task? , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).