Understanding factors that affect response rates in twitter

In information networks where users send messages to one another, the issue of information overload naturally arises: which are the most important messages? In this paper we study the problem of understanding the importance of messages in Twitter. We approach this problem in two stages. First, we perform an extensive characterization of a very large Twitter dataset which includes all users, social relations, and messages posted from the beginning of the service up to August 2009. We show evidence that information overload is present: users sometimes have to search through hundreds of messages to find those that are interesting to reply or retweet. We then identify factors that influence user response or retweet probability: previous responses to the same tweeter, the tweeter's sending rate, the age and some basic text elements of the tweet. In our second stage, we show that some of these factors can be used to improve the presentation order of tweets to the user. First, by inspecting user activity over time, we construct a simple on-off model of user behavior that allows us to infer when a user is actively using Twitter. Then, we explore two methods from machine learning for ranking tweets: a Naive Bayes predictor and a Support Vector Machine classifier. We show that it is possible to reorder tweets to increase the fraction of replied or retweeted messages appearing in the first p positions of the list by as much as 50-60%.

[1]  Martin F. Arlitt,et al.  Characterizing Web user sessions , 2000, PERV.

[2]  A. Bernardo,et al.  Huberman, Romero, and Wu, Fang. . Social Networks that Matter: Twitter Under the Microscope. , 2008 .

[3]  Krishna P. Gummadi,et al.  On word-of-mouth based discovery of the web , 2011, IMC '11.

[4]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[5]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[6]  Krishna P. Gummadi,et al.  Understanding and combating link farming in the twitter social network , 2012, WWW.

[7]  Munmun De Choudhury,et al.  What makes conversations interesting?: themes, participants and consequences of conversations in online social media , 2009, WWW '09.

[8]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[9]  Chih-Jen Lin,et al.  A Simple Decomposition Method for Support Vector Machines , 2002, Machine Learning.

[10]  Duncan J. Watts,et al.  Everyone's an influencer: quantifying influence on twitter , 2011, WSDM '11.

[11]  Michael S. Bernstein,et al.  Eddi: interactive topic-based browsing of social status streams , 2010, UIST.

[12]  William G. Cochran,et al.  Sampling Techniques, 3rd Edition , 1963 .

[13]  Michael S. Bernstein,et al.  Who gives a tweet?: evaluating microblog content value , 2012, CSCW.

[14]  Scott Counts,et al.  Taking It All In? Visual Attention in Microblog Consumption , 2021, ICWSM.

[15]  Virgílio A. F. Almeida,et al.  A methodology for workload characterization of E-commerce sites , 1999, EC '99.

[16]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[17]  Albert-László Barabási,et al.  The origin of bursts and heavy tails in human dynamics , 2005, Nature.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[20]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[21]  Konstantin Beznosov,et al.  The socialbot network: when bots socialize for fame and money , 2011, ACSAC '11.

[22]  Fang Wu,et al.  Social Networks that Matter: Twitter Under the Microscope , 2008, First Monday.

[23]  Richard B. Bunt,et al.  Hierarchical Workload Characterization for a Busy Web Server , 2002, Computer Performance Evaluation / TOOLS.

[24]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[25]  Ed H. Chi,et al.  Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network , 2010, 2010 IEEE Second International Conference on Social Computing.