Robust to Noise Models in Natural Language Processing Tasks

There are a lot of noise texts surrounding a person in modern life. The traditional approach is to use spelling correction, yet the existing solutions are far from perfect. We propose robust to noise word embeddings model, which outperforms existing commonly used models, like fasttext and word2vec in different tasks. In addition, we investigate the noise robustness of current models in different natural language processing tasks. We propose extensions for modern models in three downstream tasks, i.e. text classification, named entity recognition and aspect extraction, which shows improvement in noise robustness over existing solutions.

[1]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[2]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[3]  Yuliya Rubtsova Automatic Term Extraction for Sentiment Classification of Dynamically Updated Text Collections into Three Classes , 2014, KESW.

[4]  Amélie Marian,et al.  Beyond the Stars: Improving Rating Predictions using Review Text Content , 2009, WebDB.

[5]  Hwee Tou Ng,et al.  An Unsupervised Neural Attention Model for Aspect Extraction , 2017, ACL.

[6]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[7]  Valentin Malykh,et al.  Noise Robustness in Aspect Extraction Task , 2018, 2018 International Conference on Artificial Intelligence Applications and Innovations (IC-AIAI).

[8]  Valentin Malykh Generalizable Architecture for Robust Word Vectors Tested by Noisy Paraphrases , 2017, AIST.

[9]  Eric Brill,et al.  Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users , 2004, EMNLP.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  M. Y. Arkhipov,et al.  Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition , 2017, ArXiv.

[12]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[13]  Ilknur Durgar El-Kahlout,et al.  Turkish Paraphrase Corpus , 2012, LREC.

[14]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[15]  Natalia Loukachevitch,et al.  Two-stage approach in Russian named entity recognition , 2016, 2016 International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT).

[16]  Sebastian Ruder,et al.  Fine-tuned Language Models for Text Classification , 2018, ArXiv.

[17]  Elena Yagunova,et al.  Construction of a Russian Paraphrase Corpus: Unsupervised Paraphrase Extraction , 2015, RuSSIR.

[18]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[19]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[20]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[21]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[22]  Alexey Sorokin Spelling Correction for Morphologically Rich Language: a Case Study of Russian , 2017, BSNLP@EACL.

[23]  Sanjeev Arora,et al.  Linear Algebraic Structure of Word Senses, with Applications to Polysemy , 2016, TACL.

[24]  Wang Ling,et al.  Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[25]  Frédéric Béchet,et al.  Named Entity Recognition , 2011 .

[26]  Xiaomo Liu,et al.  Data Sets: Word Embeddings Learned from Tweets and General Data , 2017, ICWSM.

[27]  Valentin Malykh,et al.  Named Entity Recognition in Noisy Domains , 2018, 2018 International Conference on Artificial Intelligence Applications and Innovations (IC-AIAI).

[28]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[29]  Victor Zue,et al.  Dialogue-Oriented Review Summary Generation for Spoken Dialogue Recommendation Systems , 2010, NAACL.

[30]  Georgios Balikas,et al.  CAp 2017 challenge: Twitter Named Entity Recognition , 2017, ArXiv.