SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings

Word alignments are useful for tasks like statistical and neural machine translation (NMT) and annotation projection. Statistical word aligners perform well, as do methods that extract alignments jointly with translations in NMT. However, most approaches require parallel training data and quality decreases as less training data is available. We propose word alignment methods that require no parallel data. The key idea is to leverage multilingual word embeddings, both static and contextualized, for word alignment. Our multilingual embeddings are created from monolingual data only without relying on any parallel data or dictionaries. We find that alignments created from embeddings are competitive and mostly superior to traditional statistical aligners, even in scenarios with abundant parallel data. For example, for a set of 100k parallel sentences, contextualized embeddings achieve a word alignment F1 for English-German that is more than 5% higher (absolute) than eflomal, a high quality alignment model.

[1]  Vishrav Chaudhary,et al.  CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data , 2019, LREC.

[2]  Kevin Knight,et al.  Using Word Vectors to Improve Word Alignments for Low Resource Machine Translation , 2018, NAACL.

[3]  Hinrich Schütze,et al.  Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages , 2017, EMNLP.

[4]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[5]  Noah A. Smith,et al.  You May Not Need Attention , 2018, ArXiv.

[6]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[7]  David Yarowsky,et al.  A Representation Learning Framework for Multi-Source Transfer Parsing , 2016, AAAI.

[8]  Philip Resnik,et al.  Evaluating Translational Correspondence using Annotation Projection , 2002, ACL.

[9]  Hermann Ney,et al.  Biasing Attention-Based Recurrent Neural Networks Using External Alignment Information , 2017, WMT.

[10]  John DeNero,et al.  Adding Interpretable Attention to Neural Translation Models Improves Word Alignment , 2019, ArXiv.

[11]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[12]  Matt Post,et al.  A Discriminative Neural Model for Cross-Lingual Word Alignment , 2019, EMNLP.

[13]  Mike Schuster,et al.  Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[15]  Hermann Ney,et al.  On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation , 2018, WMT.

[16]  Guillaume Lample,et al.  Massively Multilingual Word Embeddings , 2016, ArXiv.

[17]  Robert E. Tarjan,et al.  On Minimum-Cost Assignments in Unbalanced Bipartite Graphs , 2012 .

[18]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[19]  Lemao Liu,et al.  On the Word Alignment from Neural Machine Translation , 2019, ACL.

[20]  Holger Schwenk,et al.  WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia , 2019, EACL.

[21]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[22]  I. Dan Melamed,et al.  Models of translation equivalence among words , 2000, CL.

[23]  Alexander M. Fraser,et al.  How Language-Neutral is Multilingual BERT? , 2019, ArXiv.

[24]  Alexander M. Fraser,et al.  Cross-lingual Annotation Projection Is Effective for Neural Part-of-Speech Tagging , 2019, Proceedings of the Sixth Workshop on.

[25]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[26]  Stephan Peitz,et al.  Jointly Learning to Align and Translate with Transformer Models , 2019, EMNLP.

[27]  Jan Niehues,et al.  Discriminative Word Alignment via Alignment Matrix Modeling , 2008, WMT@ACL.

[28]  Ted Pedersen,et al.  An Evaluation Exercise for Word Alignment , 2003, ParallelTexts@NAACL-HLT.

[29]  Khalil Sima'an,et al.  Word Alignment without NULL Words , 2016, ACL.

[30]  Christof Monz,et al.  What does Attention in Neural Machine Translation Pay Attention to? , 2017, IJCNLP.

[31]  Hermann Ney,et al.  Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation , 2017, Prague Bull. Math. Linguistics.

[32]  Christopher D. Manning,et al.  Stanza: A Python Natural Language Processing Toolkit for Many Human Languages , 2020, ACL.

[33]  ZVI GALIL,et al.  Efficient algorithms for finding maximum matching in graphs , 1986, CSUR.

[34]  Lemao Liu,et al.  Neural Machine Translation with Supervised Attention , 2016, COLING.

[35]  Masaaki Nagata,et al.  A Supervised Word Alignment Method Based on Cross-Language Span Prediction Using Multilingual BERT , 2020, EMNLP.

[36]  Zhiguo Wang,et al.  Supervised Attentions for Neural Machine Translation , 2016, EMNLP.

[37]  Philipp Koehn,et al.  Saliency-driven Word Alignment Interpretation for Neural Machine Translation , 2019, WMT.

[38]  Robert Östling,et al.  Word Order Typology through Multilingual Word Alignment , 2015, ACL.

[39]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[40]  Fei Xia,et al.  Automatically Identifying Computationally Relevant Typological Features , 2008, IJCNLP.

[41]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[42]  Hermann Ney,et al.  Alignment-Based Neural Machine Translation , 2016, WMT.

[43]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[44]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[45]  Gholamreza Haffari,et al.  Improving Word Alignment of Rare Words with Word Embeddings , 2016, COLING.

[46]  Alexander M. Fraser,et al.  Embedding Learning Through Multilingual Concept Induction , 2018, ACL.

[47]  Robert Östling,et al.  Bayesian Models for Multilingual Word Alignment , 2015 .

[48]  Ronan Collobert,et al.  Neural Network-based Word Alignment through Score Aggregation , 2016, WMT.

[49]  John DeNero,et al.  End-to-End Neural Word Alignment Outperforms GIZA++ , 2020, ACL.

[50]  Christopher D. Manning,et al.  A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[51]  Alon Lavie,et al.  Unsupervised Word Alignment with Arbitrary Features , 2011, ACL.

[52]  Taro Watanabe,et al.  Recurrent Neural Networks for Word Alignment Model , 2014, ACL.

[53]  Mirella Lapata,et al.  Cross-lingual Annotation Projection for Semantic Roles , 2009, J. Artif. Intell. Res..

[54]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[55]  Mikel L. Forcada,et al.  ParaCrawl: Web-scale parallel corpora for the languages of the EU , 2019, MTSummit.

[56]  Anh Khoi Ngo Ho,et al.  Neural Baselines for Word Alignment , 2020, IWSLT.

[57]  Wenhu Chen,et al.  Guided Alignment Training for Topic-Aware Neural Machine Translation , 2016, AMTA.

[58]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[59]  U. Germann Aligned Hansards of the 36th Parliament of Canada , 2001 .

[60]  Lemao Liu,et al.  Target Foresight Based Attention for Neural Machine Translation , 2018, NAACL-HLT.

[61]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[62]  David Mareček,et al.  Automatic Alignment of Tectogrammatical Trees from Czech-English Parallel Corpus , 2008 .

[63]  Faili Heshaam,et al.  Phrase Alignments in Parallel Corpus Using Bootstrapping Approach , 2014 .

[64]  Nenghai Yu,et al.  Word Alignment Modeling with Context Dependent Deep Neural Network , 2013, ACL.

[65]  Ondrej Bojar,et al.  Czech-English Word Alignment , 2006, LREC.

[66]  Jörg Tiedemann,et al.  Efficient Word Alignment with Markov Chain Monte Carlo , 2016, Prague Bull. Math. Linguistics.

[67]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[68]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[69]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[70]  Rico Sennrich,et al.  The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives , 2019, EMNLP.

[71]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[72]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[73]  Christof Monz,et al.  NeurAlign: Combining Word Alignments Using Neural Networks , 2005, HLT/EMNLP.

[74]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[75]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[76]  Eneko Agirre,et al.  A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings , 2018, ACL.