Unsupervised Training for Large Vocabulary Translation Using Sparse Lexicon and Word Classes

We address for the first time unsupervised training for a translation task with hundreds of thousands of vocabulary words. We scale up the expectation-maximization (EM) algorithm to learn a large translation table without any parallel text or seed lexicon. First, we solve the memory bottleneck and enforce the sparsity with a simple thresholding scheme for the lexicon. Second, we initialize the lexicon training with word classes, which efficiently boosts the performance. Our methods produced promising results on two large-scale unsupervised translation tasks.

[1]  Hermann Ney,et al.  Beam Search for Solving Substitution Ciphers , 2013, ACL.

[2]  Kevin Knight,et al.  Deciphering Foreign Language , 2011, ACL.

[3]  Sujith Ravi Scalable Decipherment for Machine Translation via Hash Sampling , 2013, ACL.

[4]  Hermann Ney,et al.  A Comparative Study on Vocabulary Reduction for Phrase Table Smoothing , 2016, WMT.

[5]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[6]  Kevin Knight,et al.  Large Scale Decipherment for Out-of-Domain Machine Translation , 2012, EMNLP-CoNLL.

[7]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[9]  Kevin Knight,et al.  Unsupervised Analysis for Decipherment Problems , 2006, ACL.

[10]  Malte Nuhn,et al.  Unsupervised training with applications in natural language processing , 2019 .

[11]  Mark Johnson,et al.  Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.

[12]  Roland Kuhn,et al.  Phrasetable Smoothing for Statistical Machine Translation , 2006, EMNLP.

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[15]  Hermann Ney,et al.  EM Decipherment for Large Vocabularies , 2014, ACL.

[16]  Kevin Knight,et al.  Bayesian Inference for Zodiac and Other Homophonic Ciphers , 2011, ACL.

[17]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[18]  Ashish Vaswani,et al.  Unifying Bayesian Inference and Vector Space Models for Improved Decipherment , 2015, ACL.

[19]  Hermann Ney,et al.  Improving Statistical Machine Translation with Word Class Models , 2013, EMNLP.

[20]  Kevin Knight,et al.  Dependency-Based Decipherment for Resource-Limited Machine Translation , 2013, EMNLP.

[21]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[22]  Hermann Ney,et al.  Deciphering Foreign Language by Combining Language Models and Context Vectors , 2012, ACL.

[23]  Anders Søgaard,et al.  Factored Translation with Unsupervised Word Clusters , 2011, WMT@EMNLP.

[24]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[25]  Hermann Ney,et al.  Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[26]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.