Challenges of computational verification in social multimedia

Fake or misleading multimedia content and its distribution through social networks such as Twitter constitutes an increasingly important and challenging problem, especially in the context of emergencies and critical situations. In this paper, the aim is to explore the challenges involved in applying a computational verification framework to automatically classify tweets with unreliable media content as fake or real. We created a data corpus of tweets around big events focusing on the ones linking to images (fake or real) of which the reliability could be verified by independent online sources. Extracting content and user features for each tweet, we explored the fake prediction accuracy performance using each set of features separately and in combination. We considered three approaches for evaluating the performance of the classifier, ranging from the use of standard cross-validation, to independent groups of tweets and to cross-event training. The obtained results included a 81% for tweet features and 75% for user ones in the case of cross-validation. When using different events for training and testing, the accuracy is much lower (up to %58) demonstrating that the generalization of the predictor is a very challenging issue.

[1]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[2]  Eunsoo Seo,et al.  Identifying rumors and their sources in social networks , 2012, Defense + Commercial Sensing.

[3]  Guofei Gu,et al.  Analyzing spammers' social networks for fun and profit: a case study of cyber criminal ecosystem on twitter , 2012, WWW.

[4]  Anupam Joshi,et al.  Faking Sandy: characterizing and identifying fake images on Twitter during Hurricane Sandy , 2013, WWW.

[5]  Barbara Poblete,et al.  Twitter under crisis: can we trust what we RT? , 2010, SOMA '10.

[6]  S. Kotz,et al.  Leipzig Affective Norms for German: A reliability study , 2010, Behavior research methods.

[7]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[8]  Christopher Cheong,et al.  Social Media Data Mining: A Social Network Analysis Of Tweets During The 2010-2011 Australian Floods , 2011, PACIS.

[9]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[10]  Ponnurangam Kumaraguru,et al.  Twitter explodes with activity in mumbai blasts! a lifeline or an unmonitored daemon in the lurking? , 2012 .

[11]  Fabrizio Silvestri,et al.  Know your neighbors: web spam detection using the web topology , 2007, SIGIR.

[12]  Yiannis Kompatsiaris,et al.  An empirical study on the combination of surf features with VLAD vectors for image search , 2012, 2012 13th International Workshop on Image Analysis for Multimedia Interactive Services.

[13]  Jaime Redondo,et al.  The Spanish adaptation of ANEW (Affective Norms for English Words) , 2007, Behavior research methods.

[14]  P. Kumaraguru,et al.  Credibility Ranking of Tweets on Events # breakingnews , 2011 .

[15]  Kevin Robert Canini,et al.  Finding Credible Information Sources in Social Networks Based on Content and Social Structure , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[16]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.