论文信息 - Data Augmentation for Sign Language Gloss Translation - 字舞流文

Data Augmentation for Sign Language Gloss Translation

Sign language translation (SLT) is often decomposed into video-to-gloss recognition and gloss to-text translation, where a gloss is a sequence of transcribed spoken-language words in the order in which they are signed. We focus here on gloss-to-text translation, which we treat as a low-resource neural machine translation (NMT) problem. However, unlike traditional low resource NMT, gloss-to-text translation differs because gloss-text pairs often have a higher lexical overlap and lower syntactic overlap than pairs of spoken languages. We exploit this lexical overlap and handle syntactic divergence by proposing two rule-based heuristics that generate pseudo-parallel gloss-text pairs from monolingual spoken language text. By pre-training on this synthetic data, we improve translation from American Sign Language (ASL) to English and German Sign Language (DGS) to German by up to 3.14 and 2.20 BLEU, respectively.

Yoav Goldberg | Amit Moryossef | Kayo Yin | Graham Neubig | Yoav Goldberg | Graham Neubig | Kayo Yin | Amit Moryossef

[1] Graham Neubig,et al. Generalized Data Augmentation for Low-Resource Translation , 2019, ACL.

[2] Oscar Koller,et al. Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Hermann Ney,et al. Neural Sign Language Translation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4] Jesse Read,et al. Better Sign Language Translation with STMC-Transformer , 2020, COLING.

[5] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[6] Graham Neubig,et al. Dynamic Data Selection and Weighting for Iterative Back-Translation , 2020, EMNLP.

[7] Kevin Duh,et al. Head Finalization: A Simple Reordering Rule for SOV Languages , 2010, WMT@ACL.

[8] Deniz Yuret,et al. Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[9] Changshui Zhang,et al. Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Alexander M. Rush,et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[11] Mohamed Jemni,et al. English-ASL Gloss Parallel Corpus 2012: ASLG-PC12 , 2012 .

[12] Graham Neubig,et al. Choosing Transfer Languages for Cross-Lingual Learning , 2019, ACL.

[13] Philipp Koehn,et al. Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[14] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[15] Patrick Littell,et al. URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors , 2017, EACL.

[16] Graham Neubig,et al. When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation? , 2018, NAACL.

[17] Karl-Friedrich Kraiss,et al. Towards a Video Corpus for Signer-Independent Continuous Sign Language Recognition , 2007 .

[18] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[19] Myle Ott,et al. Understanding Back-Translation at Scale , 2018, EMNLP.

[20] Don Tuggener,et al. Incremental Coreference Resolution for German , 2016 .

[21] Graham Neubig,et al. Handling Syntactic Divergence in Low-resource Machine Translation , 2019, EMNLP/IJCNLP.

[22] Kenneth Heafield,et al. Copied Monolingual Data Improves Low-Resource Neural Machine Translation , 2017, WMT.

[23] Alon Lavie,et al. COMET: A Neural Framework for MT Evaluation , 2020, EMNLP.

[24] Onno Crasborn,et al. Glossing a multi-purpose sign language corpus , 2010, LREC 2010.

[25] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[26] Chris Callison-Burch,et al. Combining Bilingual and Comparable Corpora for Low Resource Machine Translation , 2013, WMT@ACL.

[27] Jesse Read,et al. Attention is All You Sign: Sign Language Translation with Transformers , 2020 .

[28] Meredith Ringel Morris,et al. Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective , 2019, ASSETS.

[29] Marc Schulder,et al. Extending the Public DGS Corpus in Size and Depth , 2020, SIGNLANG.