Noisy Parallel Corpus Filtering through Projected Word Embeddings
暂无分享,去创建一个
[1] Guillaume Lample,et al. Word Translation Without Parallel Data , 2017, ICLR.
[2] Jörg Tiedemann,et al. Efficient Word Alignment with Markov Chain Monte Carlo , 2016, Prague Bull. Math. Linguistics.
[3] Philipp Koehn,et al. Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English , 2019, ArXiv.
[4] Houda Bouamor,et al. H2@BUCC18: Parallel Sentence Extraction from Comparable Corpora Using Multilingual Sentence Embeddings , 2018, BUCC@LREC.
[5] Raivis Skadins,et al. Word Alignment Based Parallel Corpora Evaluation and Cleaning Using Machine Learning Techniques , 2015, EAMT.
[6] Huda Khayrallah,et al. Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering , 2018, WMT.
[7] Marcis Pinnis,et al. Tilde’s Parallel Corpus Filtering Methods for WMT 2018 , 2018, WMT.
[8] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[9] Alexandros Nanopoulos,et al. Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..
[10] Holger Schwenk,et al. Filtering and Mining Parallel Data in a Joint Multilingual Space , 2018, ACL.
[11] Philipp Koehn,et al. Zipporah: a Fast and Scalable Data Cleaning System for Noisy Web-Crawled Parallel Corpora , 2017, EMNLP.