The importance of seed nodes in spreading information in social networks: A case study

In today's world, social media is a powerful tool: spreading information, and changing the way we receive news. It often reaches faster and farther than any any other channel. The availability of large scale data on online social media motivates our questions on social behavior in spreading information on the network. We conduct extensive experiments on Twitter data, to determine how important some people are in spreading information to the world. The Twitter data used in this study is stored on an HDFS and manipulated using algorithms framed in the MapReduce paradigm of Hadoop. An information flow network is generated from the Twitter data (81540798 tweets) based on the hashtags #Brexit, #Euro and #Rio. We study the effectiveness of K-core and PageRank algorithms in identifying important seeds in social networks, by comparing their outputs against the true seeds obtained from the two-phase algorithm. K-core performs better by most similarity indices, when compared to PageRank.