A Discriminative Neural Model for Cross-Lingual Word Alignment

We introduce a novel discriminative word alignment model, which we integrate into a Transformer-based machine translation model. In experiments based on a small number of labeled examples (∼1.7K–5K sentences) we evaluate its performance intrinsically on both English-Chinese and English-Arabic alignment, where we achieve major improvements over unsupervised baselines (11–27 F1). We evaluate the model extrinsically on data projection for Chinese NER, showing that our alignments lead to higher performance when used to project NER tags from English to Chinese. Finally, we perform an ablation analysis and an annotation experiment that jointly support the utility and feasibility of future manual alignment elicitation.

[1]  Ting Liu,et al.  Generating Chinese named entity data from parallel corpora , 2014, Frontiers of Computer Science.

[2]  Nianwen Xue,et al.  GALE Chinese-English Parallel Aligned Treebank -- Training , 2015 .

[3]  Hermann Ney,et al.  Alignment-Based Neural Machine Translation , 2016, WMT.

[4]  Taro Watanabe,et al.  Recurrent Neural Networks for Word Alignment Model , 2014, ACL.

[5]  Mirella Lapata,et al.  Cross-lingual Annotation Projection for Semantic Roles , 2009, J. Artif. Intell. Res..

[6]  Christof Monz,et al.  What does Attention in Neural Machine Translation Pay Attention to? , 2017, IJCNLP.

[7]  Hermann Ney,et al.  Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation , 2017, Prague Bull. Math. Linguistics.

[8]  Lemao Liu,et al.  Neural Machine Translation with Supervised Attention , 2016, COLING.

[9]  Iryna Gurevych,et al.  Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need! , 2018, COLING.

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Rebecca Hwa,et al.  A Backoff Model for Bootstrapping Resources for Non-English Languages , 2005, HLT/EMNLP.

[12]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[13]  Alexander M. Fraser,et al.  Squibs and Discussions: Measuring Word Alignment Quality for Statistical Machine Translation , 2007, CL.

[14]  Serge Sharoff Language adaptation experiments via cross-lingual embeddings for related languages , 2018, LREC.

[15]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[16]  Ronan Collobert,et al.  Neural Network-based Word Alignment through Score Aggregation , 2016, WMT.

[17]  Nadi Tomeh,et al.  Discriminative Alignment Models For Statistical Machine Translation , 2012 .

[18]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[19]  Hermann Ney,et al.  On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation , 2018, WMT.

[20]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[21]  Satoshi Nakamura,et al.  Incorporating Discrete Translation Lexicons into Neural Machine Translation , 2016, EMNLP.

[22]  Holger Schwenk,et al.  Continuous Space Translation Models for Phrase-Based Statistical Machine Translation , 2012, COLING.

[23]  Wenhu Chen,et al.  Guided Alignment Training for Topic-Aware Neural Machine Translation , 2016, AMTA.

[24]  David Yarowsky,et al.  Inducing Information Extraction Systems for New Languages via Cross-language Projection , 2002, COLING.

[25]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26]  Barbara Plank,et al.  Multilingual Projection for Parsing Truly Low-Resource Languages , 2016, TACL.

[27]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[28]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[29]  Kathy McKeown,et al.  Neural Network Alignment for Sentential Paraphrases , 2019, ACL.

[30]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[31]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[32]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[33]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[34]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[35]  Hermann Ney,et al.  Biasing Attention-Based Recurrent Neural Networks Using External Alignment Information , 2017, WMT.

[36]  John DeNero,et al.  Adding Interpretable Attention to Neural Translation Models Improves Word Alignment , 2019, ArXiv.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Matt Post,et al.  We start by defining the recurrent architecture as implemented in S OCKEYE , following , 2018 .

[39]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.