论文信息 - Automatic Correction of ASR Outputs by Using Machine Translation

Automatic Correction of ASR Outputs by Using Machine Translation

One of the main challenges when working with a domainindependent automatic speech recognizers (ASR) is to correctly transcribe rare or out-of-vocabulary words that are not included in the language model or whose probabilities are sub-estimated. Although the common solution would be to adapt the language models and pronunciation vocabularies, in some conditions, like when using free online recognizers, that is not possible and therefore it is necessary to apply postrecognition rectifications. In this paper, we propose an automatic correction procedure based on using a phrase-based machine translation system trained using words and phonetic encoding representations to the generated n-best lists of ASR results. Our experiments on two different datasets: human computer interfaces for robots, and human to human dialogs about tourism information show that the proposed methodology can provide a quick and robust mechanism to improve the performance of the ASR by reducing the word error rate (WER) and character error rate (CER).

Rafael E. Banchs | Luis Fernando D'Haro | L. F. D’Haro

[1] Francisco Casacuberta,et al. The New Thot Toolkit for Fully-Automatic and Interactive Statistical Machine Translation , 2014, EACL.

[2] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[3] Shankar Kumar,et al. Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[4] Navdeep Jaitly,et al. Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition , 2012, INTERSPEECH.

[5] Philipp Koehn,et al. Factored Translation Models , 2007, EMNLP.

[6] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[7] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[8] Steve Young. A review of large-vocabulary continuous-speech , 1996 .

[9] Joseph Polifroni,et al. Recognition confidence scoring and its use in speech understanding systems , 2002, Comput. Speech Lang..

[10] Antoine Raux,et al. Dialog State Tracking Challenge Handbook , 2012 .

[11] Lawrence Philips,et al. The double metaphone search algorithm , 2000 .