QCRI Live Speech Translation System

We present QCRI’s Arabic-to-English speech translation system. It features modern web technologies to capture live audio, and broadcasts Arabic transcriptions and English translations simultaneously. Our Kaldi-based ASR system uses the Time Delay Neural Network architecture, while our Machine Translation (MT) system uses both phrase-based and neural frameworks. Although our neural MT system is slower than the phrase-based system, it produces significantly better translations and is memory efficient.

[1]  Marcin Junczys-Dowmunt,et al.  Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions , 2016, IWSLT.

[2]  Nadir Durrani,et al.  A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.

[3]  Sameer Khurana,et al.  QCRI advanced transcription system (QATS) for the Arabic Multi-Dialect Broadcast media recognition: MGB-2 challenge , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[4]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[5]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[6]  Nadir Durrani,et al.  QCRI Machine Translation Systems for IWSLT 16 , 2017, ArXiv.

[7]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[8]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[9]  Nadir Durrani,et al.  Integrating an Unsupervised Transliteration Model into Statistical Machine Translation , 2014, EACL.

[10]  George Saon,et al.  Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[11]  Tanja Schultz,et al.  Grapheme based speech recognition , 2003, INTERSPEECH.

[12]  James R. Glass,et al.  The MGB-2 challenge: Arabic multi-dialect broadcast media recognition , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[13]  Yiming Wang,et al.  Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.

[14]  Sanjeev Khudanpur,et al.  Audio augmentation for speech recognition , 2015, INTERSPEECH.