Tracing Information Flow and Analyzing the Effects of Incomplete Data in Social Media

Although relatively new on the scene, social media has become a powerful force - growing fast in scope, audience and influence. Social Media data comes in many forms: blogs, micro-blogs social networking, wikis, social bookmarking, social news ,reviews, and multimedia sharing. Online social media represent a fundamental shift of how information is being produced, transferred and consumed. Today On-line information reaches us in small increments from real-time sources and through social networks. The present paper investigates information flow through Social Media by analyzing underlying mechanisms for the real-time spread of information through on-line networks and various mechanisms that can be used to correct the effects and biases arising from incomplete and missing data. The methods that we study to trace the information flow includes cascading links to articles, URLs and hash tags on Twitter. We address the problem of missing data in information cascades. Our studies show that the k-tree model is an effective tool to study the effects of missing data in cascades.

[1]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[2]  Carter T. Butts,et al.  Network inference, error, and informant (in)accuracy: a Bayesian approach , 2003, Soc. Networks.

[3]  Norman T. J. Bailey,et al.  The Mathematical Theory of Infectious Diseases , 1975 .

[4]  Christos Faloutsos,et al.  Cascading Behavior in Large Blog Graphs , 2007 .

[5]  Gueorgi Kossinets Effects of missing data in social networks , 2006, Soc. Networks.

[6]  Matthew Richardson,et al.  Mining knowledge-sharing sites for viral marketing , 2002, KDD.

[7]  Jure Leskovec,et al.  The Network Completion Problem: Inferring Missing Nodes and Edges in Networks , 2011, SDM.

[8]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[9]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[10]  Matthew Hurst,et al.  Deriving marketing intelligence from online discussion , 2005, KDD '05.

[11]  S. Wasserman,et al.  Models and Methods in Social Network Analysis: Structural Analysis in the Social Sciences , 2005 .

[12]  Jure Leskovec,et al.  Social media analytics: tracking, modeling and predicting the flow of information through networks , 2011, WWW.

[13]  Jure Leskovec,et al.  Correcting for missing data in information cascades , 2011, WSDM '11.

[14]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[15]  K. Selçuk Candan,et al.  How Does the Data Sampling Strategy Impact the Discovery of Information Diffusion in Social Media? , 2010, ICWSM.