Empirical Study of Sentence Embeddings for English Sentences Quality Assessment

Novel deep learning and machine translation techniques have greatly advanced the field of computational linguistics enabling us to find meaningful latent spaces for text analysis. While several embedding techniques exist for words, sentences, and entire documents, the potential applications are still being explored. In this paper we present the impact of top-performing sentence embedding methodologies on the accuracy of a neural model trained to assess the quality of English sentences. We focus our efforts in the methodologies called Language Agnostic SEntence Representation (LASER), Sentence to Vector (S2V), and Universal Sentence Encoder (USE) to observe their ability to capture information related to sentence quality. Our study suggests that these state-of-the-art sentence embeddings are unable to capture sufficient information regarding sentence correctness and quality in the English language.

[1]  Kevin Gimpel,et al.  Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations , 2017, ArXiv.

[2]  Keith Stevens,et al.  Effective Parallel Corpus Mining using Bilingual Sentence Embeddings , 2018, WMT.

[3]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[4]  Philipp Koehn,et al.  Low-Resource Corpus Filtering Using Multilingual Sentence Embeddings , 2019, WMT.

[5]  Guillaume Lample,et al.  What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[6]  Alina Wróblewska,et al.  Empirical Linguistic Study of Sentence Embeddings , 2019, ACL.

[7]  Mona T. Diab,et al.  Scalable Cross-Lingual Transfer of Neural Sentence Embeddings , 2019, *SEMEVAL.

[8]  Pablo Rivas Modeling Five Sentence Quality Representations by Finding Latent Spaces Produced with Deep Long Short-Memory Models , 2019, WNLP@ACL.

[9]  Deyi Xiong,et al.  Encoding Gated Translation Memory into Neural Machine Translation , 2018, EMNLP.

[10]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[11]  Matthijs Douze,et al.  Learning Joint Multilingual Sentence Representations with Neural Machine Translation , 2017, Rep4NLP@ACL.

[12]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[13]  Christian S. Perone,et al.  Evaluation of sentence embeddings in downstream and linguistic probing tasks , 2018, ArXiv.

[14]  Nan Hua,et al.  Universal Sentence Encoder for English , 2018, EMNLP.