论文信息 - Unsupervised Training for Large Vocabulary Translation Using Sparse Lexicon and Word Classes

Unsupervised Training for Large Vocabulary Translation Using Sparse Lexicon and Word Classes

We address for the first time unsupervised training for a translation task with hundreds of thousands of vocabulary words. We scale up the expectation-maximization (EM) algorithm to learn a large translation table without any parallel text or seed lexicon. First, we solve the memory bottleneck and enforce the sparsity with a simple thresholding scheme for the lexicon. Second, we initialize the lexicon training with word classes, which efficiently boosts the performance. Our methods produced promising results on two large-scale unsupervised translation tasks.

Hermann Ney | Yunsu Kim | Julian Schamper

[1] Hermann Ney,et al. Beam Search for Solving Substitution Ciphers , 2013, ACL.

[2] Kevin Knight,et al. Deciphering Foreign Language , 2011, ACL.

[3] Sujith Ravi. Scalable Decipherment for Machine Translation via Hash Sampling , 2013, ACL.

[4] Hermann Ney,et al. A Comparative Study on Vocabulary Reduction for Phrase Table Smoothing , 2016, WMT.

[5] Philipp Koehn,et al. Factored Translation Models , 2007, EMNLP.

[6] Kevin Knight,et al. Large Scale Decipherment for Out-of-Domain Machine Translation , 2012, EMNLP-CoNLL.

[7] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[9] Kevin Knight,et al. Unsupervised Analysis for Decipherment Problems , 2006, ACL.

[10] Malte Nuhn,et al. Unsupervised training with applications in natural language processing , 2019 .

[11] Mark Johnson,et al. Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.

[12] Roland Kuhn,et al. Phrasetable Smoothing for Statistical Machine Translation , 2006, EMNLP.

[13] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14] Mauro Cettolo,et al. WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[15] Hermann Ney,et al. EM Decipherment for Large Vocabularies , 2014, ACL.

[16] Kevin Knight,et al. Bayesian Inference for Zodiac and Other Homophonic Ciphers , 2011, ACL.

[17] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[18] Ashish Vaswani,et al. Unifying Bayesian Inference and Vector Space Models for Improved Decipherment , 2015, ACL.

[19] Hermann Ney,et al. Improving Statistical Machine Translation with Word Class Models , 2013, EMNLP.

[20] Kevin Knight,et al. Dependency-Based Decipherment for Resource-Limited Machine Translation , 2013, EMNLP.

[21] Daniel Marcu,et al. A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[22] Hermann Ney,et al. Deciphering Foreign Language by Combining Language Models and Context Vectors , 2012, ACL.

[23] Anders Søgaard,et al. Factored Translation with Unsupervised Word Clusters , 2011, WMT@EMNLP.

[24] Thomas L. Griffiths,et al. A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[25] Hermann Ney,et al. Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[26] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.