论文信息 - QCRI Live Speech Translation System

QCRI Live Speech Translation System

We present QCRI’s Arabic-to-English speech translation system. It features modern web technologies to capture live audio, and broadcasts Arabic transcriptions and English translations simultaneously. Our Kaldi-based ASR system uses the Time Delay Neural Network architecture, while our Machine Translation (MT) system uses both phrase-based and neural frameworks. Although our neural MT system is slower than the phrase-based system, it produces significantly better translations and is memory efficient.

[1] Marcin Junczys-Dowmunt,et al. Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions , 2016, IWSLT.

[2] Nadir Durrani,et al. A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.

[3] Sameer Khurana,et al. QCRI advanced transcription system (QATS) for the Arabic Multi-Dialect Broadcast media recognition: MGB-2 challenge , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[4] Richard M. Schwartz,et al. Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[5] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[6] Nadir Durrani,et al. QCRI Machine Translation Systems for IWSLT 16 , 2017, ArXiv.

[7] Rico Sennrich,et al. Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[8] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[9] Nadir Durrani,et al. Integrating an Unsupervised Transliteration Model into Statistical Machine Translation , 2014, EACL.

[10] George Saon,et al. Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[11] Tanja Schultz,et al. Grapheme based speech recognition , 2003, INTERSPEECH.

[12] James R. Glass,et al. The MGB-2 challenge: Arabic multi-dialect broadcast media recognition , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[13] Yiming Wang,et al. Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.

[14] Sanjeev Khudanpur,et al. Audio augmentation for speech recognition , 2015, INTERSPEECH.