Prediction of popular tweets using Similarity Learning

Social media is gaining popularity due to its information spreading feature. Twitter is one of the most powerful source of information sharing because of its massive users. Consequently, Twitter has become a popular resource in order to analyze the data for different research purposes like social engineering, sentiment analysis, business purposes etc. due to its easy data availability. In Twitter, the information may be categorized as important or un-important. Whatever information spreads through re-tweets becomes important or popular. As popular messages contain vital information for the users, one has to study the characteristics of such messages since it is related to breaking news identification, viral marketing and other similar tasks. In this research, we investigate the prediction of the popularity of messages by the number of re-tweets. We transform this task into a classification problem and existing Similarity Learning Algorithm (SiLA) is applied. SiLA, an extension of voted perceptron algorithm, learns the similarity matrix for kNN classification before classifying tweets as either popular or un-popular based on the content features. We classify tweets in binary as well as multi-class classification. For the former case, we consider that either the tweet has been re-tweeted (meaning popular) or not (unpopular). However, in the case of multi-class classification, SiLA uses different popular bands, defined by the number of re-tweet count The binary classification algorithm achieved 85% accuracy and the multi-class classification achieved 73% accuracy. Experimental results show that learning similarity measures improve the accuracy when compared with other kNN based methods like cosine similarity and Euclidean distance.