Data Acquisition in Social Networks: Issues and Proposals

The amount of information that is possible to gather from social networks may be useful to different contexts ranging from marketing to intelligence. In this paper, we describe the three main techniques for data acquisition in social networks, the conditions under which they can be applied, and the open problems. We then focus on the main issues that crawlers have to address for getting data from social networks, and we propose a novel solution that exploits the cloud computing paradigm for crawling. The proposed crawler is modular by design and relies on a large number of distributed nodes and on the MapReduce framework to speedup the data collection process from large social networks.

[1]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[2]  Balachander Krishnamurthy,et al.  Internet Measurement - Infrastructure, Traffic, and Applications , 2006 .

[3]  Zongpeng Li,et al.  Youtube traffic characterization: a view from the edge , 2007, IMC '07.

[4]  Michele Colajanni,et al.  Dynamic load balancing for network intrusion detection systems based on distributed architectures , 2007, Sixth IEEE International Symposium on Network Computing and Applications (NCA 2007).

[5]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[6]  Balachander Krishnamurthy,et al.  A measure of Online Social Networks , 2009, 2009 First International Communication Systems and Networks and Workshops.

[7]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[8]  Krishna P. Gummadi,et al.  Characterizing social cascades in flickr , 2008, WOSN '08.

[9]  Claudia Canali,et al.  A quantitative methodology to identify relevant users in social networks , 2010, 2010 IEEE International Workshop on: Business Applications of Social Network Analysis (BASNA).

[10]  Michele Colajanni,et al.  Performance Evolution of Mobile Web-Based Services , 2009, IEEE Internet Computing.

[11]  Seungyeop Han,et al.  Analysis of topological characteristics of huge online social networking services , 2007, WWW '07.

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Chen-Nee Chuah,et al.  Unveiling facebook: a measurement study of social network based applications , 2008, IMC '08.

[14]  Balachander Krishnamurthy,et al.  Network level footprints of facebook applications , 2009, IMC '09.

[15]  Kristina Lerman,et al.  Social Information Processing in News Aggregation , 2007, IEEE Internet Computing.

[16]  Krishna P. Gummadi,et al.  A measurement-driven analysis of information propagation in the flickr social network , 2009, WWW '09.