论文信息 - Automatic Long Sentence Segmentation for Neural Machine Translation

Automatic Long Sentence Segmentation for Neural Machine Translation

Neural machine translation (NMT) is an emerging machine translation paradigm that translates texts with an encoder-decoder neural architecture. Very recent studies find that translation quality drops significantly when NMT translates long sentences. In this paper, we propose a novel method to deal with this issue by segmenting long sentences into several clauses. We introduce a split and reordering model to collectively detect the optimal sequence of segmentation points for a long source sentence. Each segmented clause is translated by the NMT system independently into a target clause. The translated target clauses are then concatenated without reordering to form the final translation for the long sentence. On NIST Chinese-English translation tasks, our segmentation method achieves a substantial improvement of 2.94 BLEU points over the NMT baseline on translating long sentences with more than 30 words, and 5.43 BLEU points on sentences of over 40 words.

Deyi Xiong | Shaohui Kuang

[1] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2] Eiichiro Sumita,et al. Splitting Input Sentence for Machine Translation Using Language Model with Sentence Similarity , 2004, COLING.

[3] Franz Josef Och,et al. Statistical machine translation: from single word models to alignment templates , 2002 .

[4] Philipp Koehn,et al. Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[5] Quoc V. Le,et al. Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[6] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[7] Hermann Ney,et al. Automatic sentence segmentation and punctuation prediction for spoken language translation , 2006, IWSLT.

[8] Qun Liu,et al. Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation , 2006, ACL.

[9] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[10] Yoshua Bengio,et al. Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation , 2014, SSST@EMNLP.

[11] Hermann Ney,et al. Sentence segmentation using IBM word alignment model 1 , 2005, EAMT.