Noisy Parallel Corpus Filtering through Projected Word Embeddings

We present a very simple method for parallel text cleaning of low-resource languages, based on projection of word embeddings trained on large monolingual corpora in high-resource languages. In spit ...