In today's world, social media is a powerful tool: spreading information, and changing the way we receive news. It often reaches faster and farther than any any other channel. The availability of large scale data on online social media motivates our questions on social behavior in spreading information on the network. We conduct extensive experiments on Twitter data, to determine how important some people are in spreading information to the world. The Twitter data used in this study is stored on an HDFS and manipulated using algorithms framed in the MapReduce paradigm of Hadoop. An information flow network is generated from the Twitter data (81540798 tweets) based on the hashtags #Brexit, #Euro and #Rio. We study the effectiveness of K-core and PageRank algorithms in identifying important seeds in social networks, by comparing their outputs against the true seeds obtained from the two-phase algorithm. K-core performs better by most similarity indices, when compared to PageRank.
[1]
Rajeev Motwani,et al.
The PageRank Citation Ranking : Bringing Order to the Web
,
1999,
WWW 1999.
[2]
Silvio Lattanzi,et al.
Connected Components in MapReduce and Beyond
,
2014,
SoCC.
[3]
Vladimir Batagelj,et al.
Fast algorithms for determining (generalized) core groups in social networks
,
2011,
Adv. Data Anal. Classif..
[4]
Hosung Park,et al.
What is Twitter, a social network or a news media?
,
2010,
WWW '10.
[5]
Sung-Hyuk Cha.
Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions
,
2007
.