Quality Estimation for Automatic Speech Recognition

We address the problem of estimating the quality of Automatic Speech Recognition (ASR) output at utterance level, without recourse to manual reference transcriptions and when information about system’s confidence is not accessible. Given a source signal and its automatic transcription, we approach this problem as a regression task where the word error rate of the transcribed utterance has to be predicted. To this aim, we explore the contribution of different feature sets and the potential of different algorithms in testing conditions of increasing complexity. Results show that our automatic quality estimates closely approximate the word error rate scores calculated over reference transcripts, outperforming a strong baseline in all the testing conditions.

[1]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[2]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation , 2007, CL.

[3]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[4]  Marcello Federico,et al.  Report on the 10th IWSLT evaluation campaign , 2013, IWSLT.

[5]  Nello Cristianini,et al.  Estimating the Sentence-Level Quality of Machine Translation Systems , 2009, EAMT.

[6]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[7]  Yaser Al-Onaizan,et al.  Goodness: A Method for Measuring Machine Translation Confidence , 2011, ACL.

[8]  Matteo Negri,et al.  FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task , 2014, WMT@ACL.

[9]  Mauro Cettolo,et al.  IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[10]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[11]  Alfons Juan-Císcar,et al.  A Word-Based Naïve Bayes Classifier for Confidence Estimation in Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Fabio Brugnara,et al.  FBK@IWSLT 2012 - ASR track , 2012, IWSLT.

[13]  Radu Soricut,et al.  TrustRank: Inducing Trust in Automatic Translations via Ranking , 2010, ACL.

[14]  José Guilherme Camargo de Souza,et al.  FBK-UEdin Participation to the WMT13 Quality Estimation Shared Task , 2013, WMT@ACL.

[15]  Daniel Jurafsky,et al.  Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates , 2010, Speech Commun..

[16]  Joachim Wagner,et al.  DCU-Symantec at the WMT 2013 Quality Estimation Shared Task , 2013, WMT@ACL.

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[19]  Julia Hirschberg,et al.  Predicting Automatic Speech Recognition Performance Using Prosodic Cues , 2000, ANLP.

[20]  Chris Quirk,et al.  Training a Sentence-Level Machine Translation Confidence Measure , 2004, LREC.

[21]  Chin-Hui Lee,et al.  Vocabulary independent discriminative utterance verification for non-keyword rejection in subword based speech recognition , 1998 .

[22]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[23]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[24]  Marcello Federico,et al.  Match without a Referee: Evaluating MT Adequacy without Reference Translations , 2012, WMT@NAACL-HLT.

[25]  Oliver Lemon,et al.  Combining Acoustic and Pragmatic Features to Predict Recognition Performance in Spoken Dialogue Systems , 2004, ACL.

[26]  Mariano Felice,et al.  Linguistic Indicators for Quality Estimation of Machine Translations , 2012 .

[27]  Chin-Hui Lee,et al.  Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition , 1996, IEEE Trans. Speech Audio Process..

[28]  Su-Youn Yoon,et al.  Predicting word accuracy for the automatic speech recognition of non-native speech , 2010, INTERSPEECH.

[29]  Sebastian Stüker,et al.  Overview of the IWSLT 2012 evaluation campaign , 2012, IWSLT.

[30]  Gunnar Evermann,et al.  Large vocabulary decoding and confidence estimation using word posterior probabilities , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[31]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[32]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[33]  José Guilherme Camargo de Souza,et al.  Adaptive Quality Estimation for Machine Translation , 2014, ACL.

[34]  Xiaoming Xi,et al.  A three-stage approach to the automated scoring of spontaneous spoken responses , 2011, Comput. Speech Lang..

[35]  Lucia Specia,et al.  Topic models for translation quality estimation for gisting purposes , 2013 .

[36]  SpeciaLucia,et al.  Machine translation evaluation versus quality estimation , 2010 .

[37]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[38]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[39]  Lucia Specia,et al.  Exploiting Objective Annotations for Minimising Translation Post-editing Effort , 2011, EAMT.

[40]  José Guilherme Camargo de Souza,et al.  Machine Translation Quality Estimation Across Domains , 2014, COLING.

[41]  D. Giuliani,et al.  FBK @ IWSLT 2013 – ASR tracks , 2013, IWSLT.

[42]  Ani Nenkova,et al.  Automatically Assessing Machine Summary Content Without a Gold Standard , 2013, CL.

[43]  Lucia Specia,et al.  Machine translation evaluation versus quality estimation , 2010, Machine Translation.

[44]  Sebastian Stüker,et al.  Overview of the IWSLT 2011 evaluation campaign , 2011, IWSLT.

[45]  P. Boersma Praat : doing phonetics by computer (version 5.1.05) , 2009 .