User Influence and Follower Metrics in a Large Twitter Dataset

Social media has become an important means to convey information. The microblogging service Twitter with about 284 million users and currently over 500 million tweets per day is an example. The site stores all the tweets once sent so that they can be retrieved later. The site has rather simple site ontology, i.e. the concepts it implements; the users are represented by a profile. They can follow other users, and a received tweet can be retweeted to all the followers of a user. In this paper we investigate diffusion of messages and influence of users on other users, mainly based on the retweet cascade size and attenuation patterns inside the cascade. We rely on a big data set collected after Boston marathon bombing on April 15, 2013. It contains about 8 million tweets and retweets sent by over 4 million different users. It was collected through the Twitter API that selects all the messages containing given keywords, including hashtags. We also collected all 7-8 billion followers of the above users during 2014. The follower relation is also used in influence estimations in some respects. The largest cascades originate from users with most followers and the cascade dies out after two or three frequency peaks.