论文信息 - Prediction of popular tweets using Similarity Learning

Prediction of popular tweets using Similarity Learning

Social media is gaining popularity due to its information spreading feature. Twitter is one of the most powerful source of information sharing because of its massive users. Consequently, Twitter has become a popular resource in order to analyze the data for different research purposes like social engineering, sentiment analysis, business purposes etc. due to its easy data availability. In Twitter, the information may be categorized as important or un-important. Whatever information spreads through re-tweets becomes important or popular. As popular messages contain vital information for the users, one has to study the characteristics of such messages since it is related to breaking news identification, viral marketing and other similar tasks. In this research, we investigate the prediction of the popularity of messages by the number of re-tweets. We transform this task into a classification problem and existing Similarity Learning Algorithm (SiLA) is applied. SiLA, an extension of voted perceptron algorithm, learns the similarity matrix for kNN classification before classifying tweets as either popular or un-popular based on the content features. We classify tweets in binary as well as multi-class classification. For the former case, we consider that either the tweet has been re-tweeted (meaning popular) or not (unpopular). However, in the case of multi-class classification, SiLA uses different popular bands, defined by the number of re-tweet count The binary classification algorithm achieved 85% accuracy and the multi-class classification achieved 73% accuracy. Experimental results show that learning similarity measures improve the accuracy when compared with other kNN based methods like cosine similarity and Euclidean distance.

Muhammad Asif Razzaq | Ali Mustafa Qamar | Hasnat Ahmed

[1] Hakan Ferhatosmanoglu,et al. Short text classification in twitter to improve information filtering , 2010, SIGIR.

[2] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[3] Calton Pu,et al. Study of Trend-Stuffing on Twitter through Text Classification , 2010 .

[4] Ling Feng,et al. Predicting lifespans of popular tweets in microblog , 2012, SIGIR '12.

[5] Ed H. Chi,et al. Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network , 2010, 2010 IEEE Second International Conference on Social Computing.

[6] Divya. Nagarajan. Short text classification , 2013 .

[7] Brian D. Davison,et al. Predicting popular messages in Twitter , 2011, WWW.

[8] Joo-Hwee Lim,et al. Similarity Learning for Nearest Neighbor Classification , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[9] Bernardo A. Huberman,et al. Predicting the Future with Social Media , 2010, Web Intelligence.