Whose posts to read: Finding social sensors for effective information acquisition

Abstract In the era of big data, it is extremely challenging to decide what information to receive and filter out in order to effectively acquire high-quality information, particularly in social media where large-scale User Generated Contents (UGC) is widely and quickly disseminated. Considering that each individual user in social network can take actions to drive the process of information diffusion, it is naturally appealing to aggregate spreading information effectively at the individual level by regarding each user as a social sensor. Along this line, in this paper, we propose a framework for effective information acquisition in social media. To be more specific, we introduce a novel measurement, the preference-based Detection Ability to evaluate the ability of social sensors to detect diffusing events, and the problem of effective information acquisition is then reduced to achieving social sensing maximization through discovering valid social sensors. In pursuit of social sensing maximization, we propose two algorithms to resolve the longstanding problems in traditional greedy methods from the perspectives of efficiency and performance. On the one hand, we propose an efficient algorithm termed LeCELF, which resolves the redundant re-evaluations in the traditional Cost-Effective Lazy Forward (CELF) algorithm. On the other hand, we observe the participation paradox phenomenon in the social sensing network, and proceed to propose a randomized selection-based algorithm called FRIENDOM to choose social sensors to improve the effectiveness of information acquisition. Experiments on a disease spreading network and real-world microblog datasets have validated that LeCELF greatly reduces the running time, whereas FRIENDOM achieves a better detection performance. The proposed framework and corresponding algorithms can be applicable in many other settings in resolving information overload problems.

[1]  Alex Pentland,et al.  Social sensing for epidemiological behavior change , 2010, UbiComp.

[2]  Jin Zhang,et al.  An approach to finding the cost-effective immunization targets for information assurance , 2014, Decis. Support Syst..

[3]  Hans-Peter Kriegel,et al.  SigniTrend: scalable detection of emerging topics in textual streams by hashed significance thresholds , 2014, KDD.

[4]  Zhiming Zheng,et al.  Searching for superspreaders of information in real-world social media , 2014, Scientific Reports.

[5]  Ryota Tomioka,et al.  Discovering Emerging Topics in Social Streams via Link Anomaly Detection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[6]  Laks V. S. Lakshmanan,et al.  A Data-Based Approach to Social Influence Maximization , 2011, Proc. VLDB Endow..

[7]  N. Christakis,et al.  Social Network Sensors for Early Detection of Contagious Outbreaks , 2010, PloS one.

[8]  Yi-Ning Tu,et al.  Indices of novelty for emerging topic detection , 2012, Inf. Process. Manag..

[9]  Donald F. Towsley,et al.  Whom to follow: Efficient followee selection for cascading outbreak detection on online social networks , 2014, Comput. Networks.

[10]  Laks V. S. Lakshmanan,et al.  Learning influence probabilities in social networks , 2010, WSDM '10.

[11]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[12]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[13]  Peter Pirolli,et al.  Do your friends make you smarter?: An analysis of social strategies in online information seeking , 2010, Inf. Process. Manag..

[14]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR Forum.

[15]  Xiuzhen Zhang,et al.  A probabilistic method for emerging topic tracking in Microblog stream , 2016, World Wide Web.

[16]  Wei Chen,et al.  Efficient influence maximization in social networks , 2009, KDD.

[17]  S. Feld Why Your Friends Have More Friends Than You Do , 1991, American Journal of Sociology.

[18]  Julita Vassileva,et al.  SocConnect: A personalized social network aggregator and recommender , 2013, Inf. Process. Manag..

[19]  Jon Kleinberg,et al.  Maximizing the spread of influence through a social network , 2003, KDD '03.

[20]  Abbas Rajabifard,et al.  Event relatedness assessment of Twitter messages for emergency response , 2017, Inf. Process. Manag..

[21]  Xiaokui Xiao,et al.  Influence maximization: near-optimal time complexity meets practical efficiency , 2014, SIGMOD Conference.

[22]  Reynold Cheng,et al.  Online Influence Maximization , 2015, KDD.

[23]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[24]  Ee-Peng Lim,et al.  Finding Bursty Topics from Microblogs , 2012, ACL.

[25]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[26]  W. Hart,et al.  Review of Sensor Placement Strategies for Contamination Warning Systems in Drinking Water Distribution Systems , 2010 .

[27]  Xiaohui Yan,et al.  A Probabilistic Model for Bursty Topic Discovery in Microblogs , 2015, AAAI.

[28]  Manuel Cebrián,et al.  Using Friends as Sensors to Detect Global-Scale Contagious Outbreaks , 2012, PloS one.

[29]  Eli Upfal,et al.  Wiggins: Detecting Valuable Information in Dynamic Networks Using Limited Resources , 2015, WSDM.

[30]  Krishna P. Gummadi,et al.  A measurement-driven analysis of information propagation in the flickr social network , 2009, WWW '09.

[31]  Avi Ostfeld,et al.  The Battle of the Water Sensor Networks (BWSN): A Design Challenge for Engineers and Algorithms , 2008 .

[32]  H. Kellerer,et al.  Introduction to NP-Completeness of Knapsack Problems , 2004 .

[33]  Andreas Krause,et al.  Lazier Than Lazy Greedy , 2014, AAAI.

[34]  Reuven Cohen,et al.  Efficient immunization strategies for computer networks and populations. , 2002, Physical review letters.

[35]  Chris Volinsky,et al.  Network-Based Marketing: Identifying Likely Adopters Via Consumer Networks , 2006, math/0606278.

[36]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[37]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[38]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[39]  Maria A. Kazandjieva,et al.  A high-resolution human contact network for infectious disease transmission , 2010, Proceedings of the National Academy of Sciences.

[40]  Laks V. S. Lakshmanan,et al.  CELF++: optimizing the greedy algorithm for influence maximization in social networks , 2011, WWW.

[41]  Samir Khuller,et al.  The Budgeted Maximum Coverage Problem , 1999, Inf. Process. Lett..

[42]  Chaogui Kang,et al.  Social Sensing: A New Approach to Understanding Our Socioeconomic Environments , 2015 .

[43]  Chenliang Li,et al.  Twevent: segment-based event detection from tweets , 2012, CIKM.

[44]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.