Online Sentence Segmentation for Simultaneous Interpretation using Multi-Shifted Recurrent Neural Network

This paper is devoted to developing a recurrent neural network (RNN) solution for segmenting the unpunctuated transcripts generated by automatic speech recognition for simultaneous interpretation. RNNs are effective in capturing long-distance dependencies and straightforward for online decoding. Thus, they are ideal for the task compared to the conventional n-gram language model (LM) based approaches and recent neural machine translation based approaches. This paper proposes a multishifted RNN to address the trade-off between accuracy and latency, which is one of the key characteristics of the task. Experiments show that our proposed method improves the segmentation accuracy measured in F1 by 21.1% while maintains approximately the same latency, and reduces the BLEU loss to the oracle segmentation by 28.6%, when compared to a strong baseline of the RNN LM-based method. Our online sentence segmentation toolkit is open-sourced1 to promote the field.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Lawrence Venuti The Translation Studies Reader , 2000 .

[3]  Gökhan Tür,et al.  Automatic detection of sentence boundaries and disfluencies based on recognized words , 1998, ICSLP.

[4]  Tomoki Toda,et al.  Optimizing Segmentation Strategies for Simultaneous Speech Translation , 2014, ACL.

[5]  Gökhan Tür,et al.  Segmentation and disfluency removal for conversational speech translation , 2014, INTERSPEECH.

[6]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[7]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[8]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[9]  Jan Niehues,et al.  The IWSLT 2015 Evaluation Campaign , 2015, IWSLT.

[10]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[11]  Hermann Ney,et al.  Evaluating Machine Translation Output with Automatic Sentence Segmentation , 2005, IWSLT.

[12]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[13]  Srinivas Bangalore,et al.  Real-time Incremental Speech-to-Speech Translation of Dialogs , 2012, NAACL.

[14]  Alexander H. Waibel,et al.  Simultaneous translation of lectures and speeches , 2007, Machine Translation.

[15]  Xiaolin Wang,et al.  An Efficient and Effective Online Sentence Segmenter for Simultaneous Interpretation , 2016, WAT@COLING.

[16]  Tomoki Toda,et al.  Simple, lexicalized choice of translation timing for simultaneous speech translation , 2013, INTERSPEECH.

[17]  Jan Niehues,et al.  The KIT translation systems for IWSLT 2015 , 2015, IWSLT.

[18]  Jan Niehues,et al.  Punctuation insertion for real-time spoken language translation , 2017, IWSLT.

[19]  Srinivas Bangalore,et al.  Incremental Segmentation and Decoding Strategies for Simultaneous Translation , 2013, IJCNLP.

[20]  Peter Bell,et al.  Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Peter Bell,et al.  Punctuated transcription of multi-genre broadcasts using acoustic and lexical approaches , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[22]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[23]  Alexandra Birch,et al.  The Samsung and University of Edinburgh’s submission to IWSLT17 , 2017, IWSLT.

[24]  Andreas Stolcke,et al.  Automatic linguistic segmentation of conversational speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[25]  Eric G. Hansen,et al.  The MITLL-AFRL IWSLT 2016 Systems , 2016, IWSLT.

[26]  A. Waibel,et al.  KIT’s Multilingual Neural Machine Translation systems for IWSLT 2017 , 2017, IWSLT.

[27]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[28]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[29]  A. Waibel,et al.  Adaptation and Combination of NMT Systems: The KIT Translation Systems for IWSLT 2016 , 2016, IWSLT.

[30]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.