论文信息 - Classification-Based Self-Learning for Weakly Supervised Bilingual Lexicon Induction - 字舞流文

Classification-Based Self-Learning for Weakly Supervised Bilingual Lexicon Induction

Effective projection-based cross-lingual word embedding (CLWE) induction critically relies on the iterative self-learning procedure. It gradually expands the initial small seed dictionary to learn improved cross-lingual mappings. In this work, we present ClassyMap, a classification-based approach to self-learning, yielding a more robust and a more effective induction of projection-based CLWEs. Unlike prior self-learning methods, our approach allows for integration of diverse features into the iterative process. We show the benefits of ClassyMap for bilingual lexicon induction: we report consistent improvements in a weakly supervised setup (500 seed translation pairs) on a benchmark with 28 language pairs.

Goran Glavaš | Anna Korhonen | Mladen Karan | Ivan Vulić | A. Korhonen | Goran Glavas | Mladen Karan | Ivan Vulic

[1] Marie-Francine Moens,et al. Bilingual Lexicon Induction by Learning to Combine Word-Level and Character-Level Representations , 2017, EACL.

[2] Tommi S. Jaakkola,et al. Gromov-Wasserstein Alignment of Word Embedding Spaces , 2018, EMNLP.

[3] Tasnim Mohiuddin,et al. LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space , 2020, EMNLP.

[4] Jacob Goldberger,et al. Aligning Vector-spaces with Noisy Supervised Lexicons , 2019, NAACL-HLT.

[5] Anders Søgaard,et al. On the Limitations of Unsupervised Bilingual Dictionary Induction , 2018, ACL.

[6] Goran Glavaš,et al. Non-Linear Instance-Based Cross-Lingual Mapping for Non-Isomorphic Embedding Spaces , 2020, ACL.

[7] Lior Wolf,et al. Non-Adversarial Unsupervised Word Translation , 2018, EMNLP.

[8] Hervé Jégou,et al. Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion , 2018, EMNLP.

[9] Claire Cardie,et al. Unsupervised Multilingual Word Embeddings , 2018, EMNLP.

[10] Eneko Agirre,et al. A Call for More Rigor in Unsupervised Cross-lingual Learning , 2020, ACL.

[11] Guillaume Lample,et al. Word Translation Without Parallel Data , 2017, ICLR.

[12] Anders Søgaard,et al. A Survey of Cross-lingual Word Embedding Models , 2017, J. Artif. Intell. Res..

[13] Anders Søgaard,et al. Simple task-specific bilingual word embeddings , 2015, NAACL.

[14] Goran Glavas,et al. How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions , 2019, ACL.

[15] Jan Snajder,et al. TakeLab: Systems for Measuring Semantic Text Similarity , 2012, *SEMEVAL.

[16] Omer Levy,et al. A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments , 2016, EACL.

[17] Christopher D. Manning,et al. Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[18] Goran Glavas,et al. Do We Really Need Fully Unsupervised Cross-Lingual Embeddings? , 2019, EMNLP.

[19] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.

[20] Ivan Titov,et al. Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[21] Graham Neubig,et al. Cross-Lingual Word Embeddings for Low-Resource Language Modeling , 2017, EACL.

[22] Marie-Francine Moens,et al. Bilingual Distributed Word Representations from Document-Aligned Comparable Data , 2015, J. Artif. Intell. Res..

[23] Matt J. Kusner,et al. From Word Embeddings To Document Distances , 2015, ICML.

[24] Benjamin Heinzerling,et al. BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages , 2017, LREC.

[25] Barbara Plank,et al. Inverted indexing for cross-lingual NLP , 2015, ACL.

[26] Alexander S. Yeh,et al. More accurate tests for the statistical significance of result differences , 2000, COLING.

[27] Shafiq R. Joty,et al. Revisiting Adversarial Autoencoder for Unsupervised Word Translation with Cycle Consistency and Improved Training , 2019, NAACL.

[28] Chris Callison-Burch,et al. A Comprehensive Analysis of Bilingual Lexicon Induction , 2017, CL.

[29] Eneko Agirre,et al. Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations , 2018, AAAI.

[30] Quoc V. Le,et al. Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[31] Steven Schockaert,et al. On the Robustness of Unsupervised and Semi-supervised Cross-lingual Word Embedding Learning , 2020, LREC.

[32] Guillaume Wenzek,et al. Trans-gram, Fast Cross-lingual Word-embeddings , 2015, EMNLP.

[33] Eneko Agirre,et al. Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[34] Phil Blunsom,et al. Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[35] Pradeep Ravikumar,et al. A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[36] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37] Marie-Francine Moens,et al. A deep learning approach to bilingual lexicon induction in the biomedical domain , 2018, BMC Bioinformatics.

[38] Fabienne Braune,et al. Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable , 2018, ACL.

[39] Samuel L. Smith,et al. Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[40] Eneko Agirre,et al. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings , 2018, ACL.

[41] Graham Neubig,et al. Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces , 2019, ACL.