论文信息 - UofL at SemEval-2016 Task 4: Multi Domain word2vec for Twitter Sentiment Classification

UofL at SemEval-2016 Task 4: Multi Domain word2vec for Twitter Sentiment Classification

In this paper, we present a transfer learning system for twitter sentiment classification and compare its performance using different feature sets that include different word representation vectors. We utilized data from a different source domain to increase the performance of our system in the target domain. Our approach was based on training various word2vec models on data from the source and target domains combined, then using these models to calculate the average word vector of all the word vectors in a tweet observation, then input the average word vector as a feature to our classifiers for training. We further developed one doc2vec model that was trained on the positive, negative and neutral tweets in the target domain only. We then used these models in calculating the average word vector for every tweet in the training set as a preprocessing step. The final evaluation results show that our approach gave a prediction accuracy on the Twitter2016 test dataset that outperformed two teams that were among the top 10 in terms of AvgF1 scores.

Adel Said Elmaghraby | Omar Abdelwahab | Omar Abdelwahab

[1] Ken-ichi Kawarabayashi,et al. Unsupervised Cross-Domain Word Representation Learning , 2015, ACL.

[2] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.

[3] Preslav Nakov,et al. SemEval-2015 Task 10: Sentiment Analysis in Twitter , 2015, *SEMEVAL.

[4] Bing Liu,et al. Mining and summarizing customer reviews , 2004, KDD.