Robust Embeddings Via Distributions

Despite recent monumental advances in the field, many Natural Language Processing (NLP) models still struggle to perform adequately on noisy domains. We propose a novel probabilistic embedding-level method to improve the robustness of NLP models. Our method, Robust Embeddings via Distributions (RED), incorporates information from both noisy tokens and surrounding context to obtain distributions over embedding vectors that can express uncertainty in semantic space more fully than any deterministic method. We evaluate our method on a number of downstream tasks using existing state-of-the-art models in the presence of both natural and synthetic noise, and demonstrate a clear improvement over other embedding approaches to robustness from the literature.

[1]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[2]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Enhong Chen,et al.  A Probabilistic Model for Learning Multi-Prototype Word Embeddings , 2014, COLING.

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Andrew McCallum,et al.  Word Representations via Gaussian Embedding , 2014, ICLR.

[7]  Zhiyuan Liu,et al.  Topical Word Embeddings , 2015, AAAI.

[8]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[9]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[10]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[11]  Varvara Logacheva,et al.  Robust Word Vectors: Context-Informed Embeddings for Noisy Texts , 2018, NUT@EMNLP.

[12]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[13]  Xuanjing Huang,et al.  Reinforced Evolutionary Neural Architecture Search , 2018, ArXiv.

[14]  Huda Khayrallah,et al.  On the Impact of Various Types of Noise on Neural Machine Translation , 2018, NMT@ACL.

[15]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[16]  Hao Zhou,et al.  Kernelized Bayesian Softmax for Text Generation , 2019, NeurIPS.

[17]  Jörg Tiedemann,et al.  Sentence embeddings in NLI with iterative refinement encoders , 2018, Natural Language Engineering.

[18]  Fabrizio Silvestri,et al.  Misspelling Oblivious Word Embeddings , 2019, NAACL.

[19]  Alexandre Lacoste,et al.  Quantifying the Carbon Emissions of Machine Learning , 2019, ArXiv.

[20]  Haoming Jiang,et al.  Contextual Text Denoising with Masked Language Models , 2019, ArXiv.

[21]  Carlos G'omez-Rodr'iguez,et al.  Towards robust word embeddings for noisy texts , 2019, Applied Sciences.

[22]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.