Some Like it Hoax: Automated Fake News Detection in Social Networks

In recent years, the reliability of information on the Internet has emerged as a crucial issue of modern society. Social network sites (SNSs) have revolutionized the way in which information is spread by allowing users to freely share content. As a consequence, SNSs are also increasingly used as vectors for the diffusion of misinformation and hoaxes. The amount of disseminated information and the rapidity of its diffusion make it practically impossible to assess reliability in a timely manner, highlighting the need for automatic hoax detection systems. As a contribution towards this objective, we show that Facebook posts can be classified with high accuracy as hoaxes or non-hoaxes on the basis of the users who "liked" them. We present two classification techniques, one based on logistic regression, the other on a novel adaptation of boolean crowdsourcing algorithms. On a dataset consisting of 15,500 Facebook posts and 909,236 users, we obtain classification accuracies exceeding 99% even when the training set contains less than 1% of the posts. We further show that our techniques are robust: they work even when we restrict our attention to the users who like both hoax and non-hoax posts. These results suggest that mapping the diffusion pattern of information can be a useful component of automatic hoax detection systems.

[1]  Eugene Fink,et al.  Detection of Internet scam using logistic regression , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[2]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[3]  Marin Vukovic,et al.  An Intelligent Automatic Hoax Detection System , 2009, KES.

[4]  Luca de Alfaro,et al.  Reliable Aggregation of Boolean Crowdsourced Tasks , 2015, HCOMP.

[5]  Benno Stein,et al.  Automatic Vandalism Detection in Wikipedia , 2008, ECIR.

[6]  J.C. Hernandez,et al.  A first step towards automatic hoax detection , 2002, Proceedings. 36th Annual 2002 International Carnahan Conference on Security Technology.

[7]  Koduvayur P. Subbalakshmi,et al.  Scam Detection in Twitter , 2014 .

[8]  James A. Hendler,et al.  Accuracy of Metrics for Inferring Trust and Reputation in Semantic Web-Based Social Networks , 2004, EKAW.

[9]  Suet-Peng Yong,et al.  Distance-based hoax detection system , 2012, 2012 International Conference on Computer & Information Science (ICCIS).

[10]  G. Caldarelli,et al.  The spreading of misinformation online , 2016, Proceedings of the National Academy of Sciences.

[11]  Paolo Rosso,et al.  Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features , 2011, CICLing.

[12]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[13]  Chrysanthos Dellarocas,et al.  The Digitization of Word-of-Mouth: Promise and Challenges of Online Feedback Mechanisms , 2003, Manag. Sci..

[14]  Lik Mui,et al.  A Computational Model of Trust and Reputation for E-businesses , 2002 .

[15]  Jure Leskovec,et al.  Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes , 2016, WWW.

[16]  Georgia Koutrika,et al.  Fighting Spam on Social Web Sites: A Survey of Approaches and Future Challenges , 2007, IEEE Internet Computing.

[17]  Luca de Alfaro,et al.  A content-driven reputation system for the wikipedia , 2007, WWW '07.

[18]  Iryna Yevseyeva,et al.  Optimising anti-spam filters with evolutionary algorithms , 2013, Expert Syst. Appl..

[19]  Predrag Pale,et al.  E-Mail System for Automatic Hoax Recognition , 2005 .

[20]  Guido Caldarelli,et al.  Science vs Conspiracy: Collective Narratives in the Age of Misinformation , 2014, PloS one.

[21]  Jing Song,et al.  Assessment of Tweet Credibility with LDA Features , 2015, WWW.