On the Effectiveness of Feature Set Augmentation Using Clusters of Word Embeddings

Word clusters have been empirically shown to offer important performance improvements on various tasks. Despite their importance, their incorporation in the standard pipeline of feature engineering relies more on a trial-and-error procedure where one evaluates several hyper-parameters, like the number of clusters to be used. In order to better understand the role of such features we systematically evaluate their effect on four tasks, those of named entity segmentation and classification as well as, those of five-point sentiment classification and quantification. Our results strongly suggest that cluster membership features improve the performance.

[1]  Wanxiang Che,et al.  Revisiting Embedding Features for Simple Semi-supervised Learning , 2014, EMNLP.

[2]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[3]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[4]  John Langford,et al.  Efficient programmable learning to search , 2014, ArXiv.

[5]  Wei Gao,et al.  Ordinal Text Quantification , 2016, SIGIR.

[6]  Georgios Balikas,et al.  TwiSE at SemEval-2016 Task 4: Twitter Sentiment Classification , 2016, *SEMEVAL.

[7]  TomasiCarlo,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000 .

[8]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[9]  Véronique Hoste,et al.  Monday mornings are my fave : ) #not Exploring the Automatic Recognition of Irony in English tweets , 2016, COLING.

[10]  Saif Mohammad,et al.  Sentiment Analysis of Short Informal Texts , 2014, J. Artif. Intell. Res..

[11]  Ioannis Partalas,et al.  Learning to Search for Recognizing Named Entities in Twitter , 2016, NUT@COLING.

[12]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[13]  Saroj Kaushik,et al.  A Paraphrase and Semantic Similarity Detection System for User Generated Short-Text Content on Microblogs , 2016, COLING.

[14]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15]  Preslav Nakov,et al.  SemEval-2016 Task 4: Sentiment Analysis in Twitter. , 2019 .

[16]  Andrea Esuli,et al.  Evaluation Measures for Ordinal Regression , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[17]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[18]  Christopher D. Manning,et al.  Effect of Non-linear Deep Architecture in Sequence Labeling , 2013, IJCNLP.

[19]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[20]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[21]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[22]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[23]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[24]  Andrea Esuli,et al.  Sentiment Quantification , 2010, IEEE Intell. Syst..

[25]  Alan Ritter,et al.  Results of the WNUT16 Named Entity Recognition Shared Task , 2016, NUT@COLING.

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.