论文信息 - Speech Recognition System Combination for Machine Translation

Speech Recognition System Combination for Machine Translation

The majority of state-of-the-art speech recognition systems make use of system combination. The combination approaches adopted have traditionally been tuned to minimising word error rates (WERs). In recent years there has been a growing interest in taking the output from speech recognition systems in one language and translating it into another. This paper investigates the use of cross-site combination approaches in terms of both WER and impact on translation performance. In addition, the stages involved in modifying the output from a speech-to-text (STT) system to be suitable for translation are described. Two source languages, Mandarin and Arabic, are recognised and then translated using a phrase-based statistical machine translation system into English. Performance of individual systems and cross-site combination using cross-adaptation and ROVER are given. Results show that the best STT combination scheme in terms of WER is not necessarily the most appropriate when translating speech.

[1] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[2] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[3] Dongxin Xu,et al. The BBN Mandarin broadcast news transcription system , 2005, INTERSPEECH.

[4] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[5] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[6] Sherif Abdou,et al. The BBN RT04 English broadcast news transcription system , 2005, INTERSPEECH.

[7] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[8] Richard M. Schwartz,et al. Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI system , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9] Jonathan G. Fiscus,et al. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[10] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[11] Andreas Stolcke,et al. Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[12] Mark J. F. Gales,et al. The Cu-Htk Mandarin Broadcast News Transcription System , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[13] Gunnar Evermann,et al. Posterior probability decoding, confidence estimation and system combination , 2000 .

[14] Andreas Stolcke,et al. Automatic linguistic segmentation of conversational speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.