论文信息 - STAVICTA Group Report for RepLab 2014 Reputation Dimension Task

STAVICTA Group Report for RepLab 2014 Reputation Dimension Task

In this paper we present our experiments on the RepLab 2014 Reputation Dimension task. RepLab is a competitive challenge for Reputation Management Systems. RepLab 2014’s reputation dimensions task focuses on categorization of Twitter messages with regard to standard reputation dimensions (such as performance, leadership, or innovation). Our approach only relies on the textual content of tweets and ignores both metadata and the content of URLs within tweets. We carried out several experiments focusing on different feature sets including bag of n-grams, distributional semantics features, and deep neural network representations. The results show that bag of bigram features with minimum frequency thresholding work quite well in reputation dimension task especially with regards to average F1 measure over all dimensions where two of our four submitted runs achieve highest and second highest scores. Our experiments also show that semi-supervised recursive autoencoders outperform other feature sets used in our experiments with regards to accuracy measure and is a promising subject of future research for improvements.

[1] Isabell M. Welpe,et al. Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[2] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[3] Emiliano Raúl Guevara,et al. Computing Semantic Compositionality in Distributional Semantics , 2011, IWCS.

[4] Jeffrey Pennington,et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[5] Julio Gonzalo,et al. Overview of RepLab 2013: Evaluating Online Reputation Monitoring Systems , 2013, CLEF.

[6] Johan Bollen,et al. Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[7] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.

[8] Anders Holst,et al. Random indexing of text samples for latent semantic analysis , 2000 .

[9] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10] Johan Bos,et al. Predicting the 2011 Dutch Senate Election Results with Twitter , 2012 .

[11] Magnus Sahlgren,et al. The Distributional Hypothesis , 2008 .

[12] Julio Gonzalo,et al. Overview of RepLab 2014: Author Profiling and Reputation Dimensions for Online Reputation Management , 2014, CLEF.

[13] Harith Alani,et al. Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[14] Charles J. Fombrun,et al. RepTrak™ Pulse: Conceptualizing and Validating a Short-Form Measure of Corporate Reputation , 2011 .

[15] Harith Alani,et al. Alleviating Data Sparsity for Twitter Sentiment Analysis , 2012, #MSM.

[16] Christopher D. Manning,et al. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[17] Bo Pang,et al. Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[18] Bernard J. Jansen,et al. Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..