Collecting Influencers: A Comparative Study of Online Network Crawlers

Online network crawling tasks require a lot of efforts for the researchers to collect the data. One of them is identification of important nodes, which has many applications starting from viral marketing to the prevention of disease spread. Various crawling algorithms has been suggested but their efficiency is not studied well. In this paper we compared six known crawlers on the task of collecting the fraction of the most influential nodes of graph. We analyzed crawlers behavior for four measures of node influence: node degree, k-coreness, betweenness centrality, and eccentricity. The experiments confirmed that greedy methods perform the best in many settings, but the cases exist when they are very inefficient.

[1]  Donald F. Towsley,et al.  Quick Detection of Nodes with Large Degrees , 2014, Internet Math..

[2]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[3]  Shyhtsun Felix Wu,et al.  Crawling Online Social Graphs , 2010, 2010 12th International Asia-Pacific Web Conference.

[4]  Jimeng Sun,et al.  Centralities in Large Networks: Algorithms and Observations , 2011, SDM.

[5]  Mindaugas Bloznelis,et al.  Degree and clustering coefficient in sparse random intersection graphs , 2013, 1303.3388.

[6]  Hawoong Jeong,et al.  Statistical properties of sampled networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Rouzbeh Hasheminezhad,et al.  Compressive closeness in networks , 2019, Applied Network Science.

[8]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[9]  Sucheta Soundarajan,et al.  DE-Crawler: A Densification-Expansion Algorithm for Online Data Collection , 2018, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[10]  Sucheta Soundarajan,et al.  Guidelines for Online Network Crawling: A Study of Data Collection Approaches and Network Properties , 2018, WebSci.

[11]  Konstantin Avrachenkov,et al.  Monte Carlo Methods for Top-k Personalized PageRank Lists and Name Disambiguation , 2010, ArXiv.

[12]  Rami Puzis,et al.  Bandit Algorithms for Social Network Queries , 2013, 2013 International Conference on Social Computing.

[13]  Radu Grosu,et al.  Identifying central nodes for information flow in social networks using compressive sensing , 2018, Social Network Analysis and Mining.

[14]  Donald F. Towsley,et al.  Pay few, influence most: Online myopic network covering , 2014, 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[15]  Minas Gjoka,et al.  Practical Recommendations on Crawling Online Social Networks , 2011, IEEE Journal on Selected Areas in Communications.

[16]  Ramana Rao Kompella,et al.  Network Sampling: From Static to Streaming Graphs , 2012, TKDD.

[17]  Hamidreza Mahyar,et al.  Detection of top-k central nodes in social networks: A compressive sensing approach , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[18]  Bruno Ribeiro,et al.  Online estimating the k central nodes of a network , 2011, 2011 IEEE Network Science Workshop.

[19]  Sampling from complex networks with high community structures. , 2012, Chaos.

[20]  Serge Abiteboul,et al.  Adaptive on-line page importance computation , 2003, WWW '03.

[21]  Christos Faloutsos,et al.  Parallel crawling for online social networks , 2007, WWW '07.